A classic data lakehouse is built on open-source table formats such as Delta.io, Iceberg, or Hudi and seamlessly integrates with big data platforms like Apache Spark and event buses like Apache Kafka or Amazon Kinesis. The popularity of the data lakehouse stems from its ability to combine the quality, speed, and simple SQL access of data warehouses with the cost-effectiveness, scalability, and support for unstructured data of data lakes. With the advent of generative AI models and the potential of using techniques such as Retrieval-augmented generation (RAG) in combination with fine-tuning or pre-training custom LLMs, a new paradigm has emerged in 2023: AI-infused lakehouses. These platforms use generative AI for code generation, natural language queries, and semantic search, LLM callouts from SQL, enhancing governance and automating documentation. How do lakehouses adapt to the integration of new AI capabilities? The live demonstration will include the continuous ingestion of IoT events through a declarative, serverless data pipeline. The live demo for the audience will process events originating from hundreds of phones in the audience amounting to around 100 million per day. This talk is for data architects who are not afraid of some code, for data engineers who love open source and cloud services, and for practitioners who enjoy a fun end-to-end demo. The Databricks Lakehouse is used for the demos.
Get notified about new features and conference additions.