Garbage data in, garbage models out: how to select the right data for robust machine learning A lot of focus in machine learning is on the predictive performance of models, with articles touting how new accuracy benchmarks are being broken by cutting edge deep learning or gradient boosting algorithms. However, what if I told you that machine learning's dirty secret is that models are only as good as the data you feed into them? In this talk, I'll describe common pitfalls when creating a dataset for machine learning work, and how to spot and avoid them. We'll talk about how to make sure the fields you've selected are actually measuring what you think they're measuring, how to make sure that the data you've selected will actually make a good model, how to make sure your model will be able to generalise to new data, and how to design experiments that give you the results you want. This talk will show these techniques in a language-agnostic manner so that you can apply them when working in your language or framework of choice. Check out more of our featured speakers and talks at https://ndcconferences.com/ https://ndcoslo.com/
Get notified about new features and conference additions.