Online Machine Learning With RiverML
A growing number of data teams have to deal with real-time data feeds. Handling these feeds is challenging. Part of the reason is down to habits: data processing is usually done in a batch fashion. This is very much the case for machine learning. The latter involves two steps: inference, and learning. Both of these can be done online. But how? What design patterns does this involve? What software components are necessary? How does this look in practice every day? We’ll try to refine these questions and answer them during this talk. In particular, we’ll focus on River, which is a Python package for online machine learning. We’ll also discuss the higher level tools which are necessary to deploy an online machine learning model into production.
Max is a data scientist, currently working at Carbonfact. He holds a PhD in machine learning applied to query optimisation in database systems. He develops and researchers online machine learning algorithms in his spare time. Max is fond of open source software, and maintains a blog where he discusses some of the things he’s working on.