This presentation was recorded at GOTO Amsterdam 2015 http://gotoams.nl Sean Owen - Director of Data Science at Cloudera ABSTRACT Apache Spark continues to gain momentum as the new processing paradigm for Apache Hadoop, and for the data scientist, it has a lot to like: natively distributed, REPL, Python APIs in addition to native Scala, and a library of machine learning [...] TIMECODES 0:00 Introduction 3:31 Feature Vectors in the Iris Data Set 6:33 Good Pet Data Set 7:40 Possible Decision Trees 10:00 Interpreting Models 15:51 Building a Decision Tree in MLlib 19:40 Evaluating a Decision Tree 22:56 Better Than Random Guessing? 26:54 Decisions Should Make Lower Impurity Subsets 29:12 Tuning Hyperparameters 38:20 How to Create a Crowd? 39:06 Trees See Subsets of Examples 39:51 Or Subsets of Features 41:06 Diversity of Opinion 41:39 Random Decision Forests Download slides and read the full abstract here: http://gotocon.com/amsterdam-2015/presentation/A%20Taste%20of%20Random%20Decision%20Forests%20on%20Apache%20Spark https://twitter.com/gotoamst https://www.facebook.com/GOTOConference http://gotocon.com #ApacheSpark #SeanOwen #Programming #ApacheHadoop #Hadoop Looking for a unique learning experience? Attend the next GOTO conference near you! Get your ticket at https://gotopia.tech Sign up for updates and specials at https://gotopia.tech/newsletter SUBSCRIBE TO OUR CHANNEL - new videos posted almost daily. https://www.youtube.com/user/GotoConferences/?sub_confirmation=1
Get notified about new features and conference additions.