More and more companies are becoming Data and AI driven - Data is the lifeblood of many systems we build and business decisions that are made today. Customer experience and journey which in turn drive the P&L for businesses rely on data that are captured and fed into our systems. It is highly imperative that this data is of the highest quality and continues to stay high quality. The quality of data has a direct impact on the quality of the ML model output, accuracy and relevance. It also has a proportional impact on the cost of running data engineering pipelines be it stream or batch data processing. Following the DataMesh pattern to building platform capabilities that powers decentralised data products, I want to layout an approach to implementing Data Quality at scale, the key steps in providing confidence and trust in the data being produced & consumed by the data product teams. In this talk, I will talk about Data Quality challenges in modern day data-driven enterprises from both Stream and Batch perspective Dimensions & metrics of Data Quality Key parts and approach to build a Data Quality platform at scale to provide near-realtime visibility to DQ issues Fitting this capability around data eco-system including triggering remediation actions such as stopping a data pipeline Join us at Devoxx UK 2022 for even more great quality content. https://www.devoxx.co.uk
Get notified about new features and conference additions.