This interview was recorded for the GOTO Book Club. gotopia.tech/bookclub (http://gotopia.tech/bookclub) Read the full transcription of the interview here (https://gotopia.tech/bookclub/episodes/234/Scaling-Machine-Learning-with-Spark) Adi Polak (https://twitter.com/AdiPolak) - VP of Developer Experience at Treeverse & Contributing to lakeFS OSS Holden Karau (https://twitter.com/holdenkarau) - Co-Author of "Kubeflow for Machine Learning" & many more books & Open Source Engineer at Netflix DESCRIPTION Learn how to build end-to-end scalable machine learning solutions with Apache Spark. With this practical guide, author Adi Polak introduces data and ML practitioners to creative solutions that supersede today's traditional methods. You'll learn a more holistic approach that takes you beyond specific requirements and organizational goals--allowing data and ML practitioners to collaborate and understand each other better. Scaling Machine Learning with Spark examines several technologies for building end-to-end distributed ML workflows based on the Apache Spark ecosystem with Spark MLlib, MLflow, TensorFlow, and PyTorch. If you're a data scientist who works with machine learning, this book shows you when and why to use each technology. You will: • Explore machine learning, including distributed computing concepts and terminology • Manage the ML lifecycle with MLflow • Ingest data and perform basic preprocessing with Spark • Explore feature engineering, and use Spark to extract features • Train a model with MLlib and build a pipeline to reproduce it • Build a data system to combine the power of Spark with deep learning • Get a step-by-step example of working with distributed TensorFlow • Use PyTorch to scale machine learning and its internal architecture * Book description: © O’Reilly (https://www.oreilly.com/library/view/scaling-machine-learning/9781098106812) The interview is based on the book "Scaling Machine Learning with Spark (https://amzn.to/3ppdUkB) " RECOMMENDED BOOKS Adi Polak • Machine Learning with Apache Spark (https://amzn.to/3ppdUkB) Holden Karau, Trevor Grant, Boris Lublinsky, Richard Liu & Ilan Filonenko • Kubeflow for Machine Learning (https://amzn.to/3JVngcx) Holden Karau • Distributed Computing 4 Kids (https://www.distributedcomputing4kids.com) Holden Karau • Scaling Python with Dask (https://www.oreilly.com/library/view/scaling-python-with/9781098119867) Holden Karau & Boris Lublinsky • Scaling Python with Ray (https://amzn.to/44GU6cC) Holden Karau & Rachel Warren • High Performance Spark (https://amzn.to/3v2eLbn) Holden Karau, Konwinski, Wendell & Zaharia • Learning Spark (https://amzn.to/397e2NE) Holden Karau & Krishna Sankar • Fast Data Processing with Spark 2nd Edition (https://amzn.to/3xKhXKu) (https://amzn.to/3rHQgOu) Bluesky (https://bsky.app/profile/gotocon.com) Twitter (https://twitter.com/GOTOcon) Instagram (https://www.instagram.com/goto_con) LinkedIn (https://www.linkedin.com/company/goto-) Facebook (https://www.facebook.com/GOTOConferences) CHANNEL MEMBERSHIP BONUS Join this channel to get early access to videos & other perks: https://www.youtube.com/channel/UCs_tLP3AiwYKwdUHpltJPuA/join Looking for a unique learning experience? Attend the next GOTO conference near you! Get your ticket: gotopia.tech (https://gotopia.tech) SUBSCRIBE TO OUR YOUTUBE CHANNEL (https://www.youtube.com/user/GotoConferences/?sub_confirmation=1) - new videos posted daily!
Get notified about new features and conference additions.