This presentation was recorded at YOW! 2017. #GOTOcon #YOW https://yowcon.com Juliet Hougland - Data Science Tech Lead for Engineering at Cloudera @juliethougland325 RESOURCES https://www.linkedin.com/in/jhlch https://mastodon.social/@[email protected] https://x.com/j_houg https://github.com/hougs https://www.baggedandboosted.com/lander ABSTRACT Apache Spark is a general purpose distributed computing framework for distributed data processing. With MLlib, Spark’s machine learning library, fitting a model to a huge data set becomes very easy. Similarly, Spark’s general purpose functionality enables application of a model across a large collection of observations. We’ll walk through fitting a model to a big data set using MLlib and applying a trained #scikitlearn model to a large data set. [...] RECOMMENDED BOOKS Adi Polak • Machine Learning with Apache Spark • https://amzn.to/3ppdUkB Damji, Wenig, Das & Lee • Learning Spark • https://amzn.to/4g555RU Bill Chambers & Matei Zaharia • Spark: The Definitive Guide • https://amzn.to/3OqVj0Y https://bsky.app/profile/gotocon.com https://twitter.com/GOTOcon https://www.linkedin.com/company/goto- https://www.instagram.com/goto_con https://www.facebook.com/GOTOConferences #ApacheSpark #Spark #MLlib #ML #MachineLearning #SoftwareEngineering #JulietHoughland #Programming #YOWcon CHANNEL MEMBERSHIP BONUS Join this channel to get early access to videos & other perks: https://www.youtube.com/channel/UCs_tLP3AiwYKwdUHpltJPuA/join Looking for a unique learning experience? Attend the next GOTO conference near you! Get your ticket at https://gotopia.tech Sign up for updates and specials at https://gotopia.tech/newsletter SUBSCRIBE TO OUR CHANNEL - new videos posted almost daily. https://www.youtube.com/user/GotoConferences/?sub_confirmation=1
Get notified about new features and conference additions.