This presentation was recorded at GOTO Copenhagen 2023. #GOTOcon #GOTOcph https://gotocph.com Tim Berglund - VP DevRel StarTree & Author of "Gradle Beyond the Basics" @tlberglund @StarTree RESOURCES http://timberglund.com https://twitter.com/tlberglund https://www.linkedin.com/in/tlberglund https://pinot.apache.org https://twitter.com/startreedata https://www.linkedin.com/company/startreedata https://dev.startree.ai https://stree.ai/slack ABSTRACT Apache Kafka has become the standard infrastructure for event-driven and streaming data systems. The stunningly simple abstraction of the distributed log provides exactly what modern microservices and real-time systems need, but no choice is without its tradeoffs. Logs are an excellent way to keep track of events, but they are notoriously difficult to query. Given a constellation of services exchanging events with each other and reacting to inputs in real time, how can you find out—and gain insight into—what has just happened? How, in other words, do you query a log? This is where Apache Pinot comes in. Developed at LinkedIn alongside Kafka, Pinot is a distributed, real-time analytics database designed to ingest data from Kafka (and other sources) and make it instantly queryable at low latency in the face of a huge number of concurrent requests. All that data tucked neatly away into topics, maintaining an immutable record of how the state of the system has evolved, can now be ingested into Pinot and made accessible through simple SQL queries. This talk explores Pinot's internal architecture, how its integration with Kafka is specially optimized, and how Pinot fits architecturally in the modern streaming stack. You'll leave understanding how Pinot works, how it fits together with Kafka, where it has been used successfully in the real world, and what steps to take next in your own Pinot learning journey. [...] TIMECODES 00:00 Intro 02:57 A brief history 12:53 Pinot architecture 24:04 Indexes 32:29 Ingest 41:51 Remember our history 44:57 Outro Download slides and read the full abstract here: https://gotocph.com/2023/sessions/2900 RECOMMENDED BOOKS Tim Berglund • Gradle Beyond the Basics • https://amzn.to/3fSjfMD Tim Berglund & Matthew McCullough • Building and Testing with Gradle • https://amzn.to/3VaBY6g Mark Needham • Building Real-Time Analytics Systems • https://amzn.to/41AOZJd Gwen Shapira, Todd Palino, Rajini Sivaram & Krit Petty • Kafka: The Definitive Guide • https://amzn.to/41AVlrO Adi Polak • Scaling Machine Learning with Spark • https://amzn.to/3N9vx1H https://twitter.com/GOTOcon https://www.linkedin.com/company/goto- https://www.instagram.com/goto_con https://www.facebook.com/GOTOConferences #ApachePinot #Analytics #RealTime #RealTimeAnalytics #TimBerglund #StarTree #StarTreeCloud #Cloud #ApachePinotTutorial #ApachePinotTraining #Snowflake #ApacheZooKeeper #ApacheHelix #Hadoop #ApacheSpark CHANNEL MEMBERSHIP BONUS Join this channel to get early access to videos & other perks: https://www.youtube.com/channel/UCs_tLP3AiwYKwdUHpltJPuA/join Looking for a unique learning experience? Attend the next GOTO conference near you! Get your ticket at https://gotopia.tech Sign up for updates and specials at https://gotopia.tech/newsletter SUBSCRIBE TO OUR CHANNEL - new videos posted almost daily. https://www.youtube.com/user/GotoConferences/?sub_confirmation=1
Get notified about new features and conference additions.