Exploration and evaluation of reinforcement learning in production
The typical assertion is that RL does not work in production systems due to the exploratory nature of agents, but is it possible to mitigate some of these assumptions? This talk will be about issues with exploration and evaluation in RL production systems, but also about mitigation in terms of sample-efficiency (for ex. through transfer or distributed/federated learning), safe exploration, and off-policy evaluation.
Jesper is a senior engineer from Ericsson, with a career spanning software design, architecture, research, and, now, machine learning. At Ericsson, he has been busy working on using machine learning to improve processes, in areas such as fault prediction and statistical analysis of code complexity. Most recently he has been using reinforcement learning in production specifically for auto-scaling cloud resources. These experiences will be used as a basis for the talk at the conference.