This interview was recorded for the GOTO Book Club. #GOTOcon #GOTObookclub http://gotopia.tech/bookclub Read the full transcription of the interview here: https://gotopia.tech/bookclub/episodes/234/Scaling-Machine-Learning-with-Spark Adi Polak - VP of Developer Experience at Treeverse & Contributing to lakeFS OSS @polakadi Holden Karau - Co-Author of "Kubeflow for Machine Learning" & many more books & Open Source Engineer at Netflix @HoldenKarau RESOURCES Adi https://twitter.com/AdiPolak https://adipolak.substack.com https://mastodon.online/@adipolak https://blog.adipolak.com https://www.linkedin.com/in/👋-adi-polak-68548365 Holden https://twitter.com/holdenkarau https://www.twitch.tv/holdenkarau https://tech.lgbt/@holden http://www.holdenkarau.com DESCRIPTION Learn how to build end-to-end scalable machine learning solutions with Apache Spark. With this practical guide, author Adi Polak introduces data and ML practitioners to creative solutions that supersede today's traditional methods. You'll learn a more holistic approach that takes you beyond specific requirements and organizational goals--allowing data and ML practitioners to collaborate and understand each other better. Scaling Machine Learning with Spark examines several technologies for building end-to-end distributed ML workflows based on the Apache Spark ecosystem with Spark MLlib, MLflow, TensorFlow, and PyTorch. If you're a data scientist who works with machine learning, this book shows you when and why to use each technology. You will: • Explore machine learning, including distributed computing concepts and terminology • Manage the ML lifecycle with MLflow • Ingest data and perform basic preprocessing with Spark • Explore feature engineering, and use Spark to extract features • Train a model with MLlib and build a pipeline to reproduce it • Build a data system to combine the power of Spark with deep learning • Get a step-by-step example of working with distributed TensorFlow • Use PyTorch to scale machine learning and its internal architecture * Book description: © O’Reilly: https://www.oreilly.com/library/view/scaling-machine-learning/9781098106812 The interview is based on the book "Scaling Machine Learning with Spark": https://amzn.to/3ppdUkB TIMECODES 00:00 Intro 02:25 Lead with the tools & resources you have 04:06 The Apache Spark ecosystem 08:44 Book chapter overview 12:22 Exploring the glue spaces in ML & data engineering 19:18 Navigating the trade-offs of distributed ML 29:37 Challenges of keeping up with Open Source software 35:22 Can we expect another book? 38:11 Outro RECOMMENDED BOOKS Adi Polak • Machine Learning with Apache Spark • https://amzn.to/3ppdUkB Holden Karau, Trevor Grant, Boris Lublinsky, Richard Liu & Ilan Filonenko • Kubeflow for Machine Learning • https://amzn.to/3JVngcx Holden Karau • Distributed Computing 4 Kids • https://www.distributedcomputing4kids.com Holden Karau • Scaling Python with Dask • https://www.oreilly.com/library/view/scaling-python-with/9781098119867 Holden Karau & Boris Lublinsky • Scaling Python with Ray • https://amzn.to/44GU6cC Holden Karau & Rachel Warren • High Performance Spark • https://amzn.to/3v2eLbn Holden Karau, Konwinski, Wendell & Zaharia • Learning Spark • https://amzn.to/397e2NE Holden Karau & Krishna Sankar • Fast Data Processing with Spark 2nd Edition • https://amzn.to/3xKhXKu Holden Karau • Fast Data Processing with Spark 1st Edition • https://amzn.to/3rHQgOu https://twitter.com/GOTOcon https://www.linkedin.com/company/goto- https://www.instagram.com/goto_con https://www.facebook.com/GOTOConferences #Spark #ApacheSpark #ML #MachineLearning #MLlib #TensorFlow #PyTortch #DataScience #AI #ComputerScience #AdiPolak #HoldenKarau #Programming #SoftwareEngineering CHANNEL MEMBERSHIP BONUS Join this channel to get early access to videos & other perks: https://www.youtube.com/channel/UCs_tLP3AiwYKwdUHpltJPuA/join Looking for a unique learning experience? Attend the next GOTO conference near you! Get your ticket at https://gotopia.tech Sign up for updates and specials at https://gotopia.tech/newsletter SUBSCRIBE TO OUR CHANNEL - new videos posted almost daily. https://www.youtube.com/user/GotoConferences/?sub_confirmation=1
Get notified about new features and conference additions.