Predicting Response Time in a Large-Scale Information Retrieval System
Meltwater’s media intelligence platform executes millions of search queries every day. An Elasticsearch based platform contains 40+ billion documents and provides search results and analytics.Queries can be interactive (a human waiting on the result) and non-interactive (batch-like queries for asynchronous reports). While most queries take in the order of milliseconds, some queries can be expensive and take seconds or minutes to execute. These ‘slow’ queries have a detrimental effect on the user experience, by hogging machine resources and increasing wait times. A 10-millisecond query that waits 30 seconds creates a bad quality of service.Accurately predicting the execution time of queries makes it possible to segregate slow queries into a separate machine pool, thus mitigating the negative impact of these queries. Historical query logs contain an abundance of data and features to train a machine learning model on. However, the problem domain is complex and the data is noisy. With thousands of queries running concurrently across more than 400 machines causing interactions that affect query runtime.This talk will describe the problem and the challenges it poses from a machine learning perspective. It will describe the attempts to solve it, with a particular focus on techniques and algorithms that are useful when working with large-scale noisy data.
David Burke has worked professionally with machine learning for the past 9 years. He has been part of teams that successfully applied machine learning techniques both in the online and TV advertising domains at Admeta/WideOrbit and is now applying his knowledge to problems in the information retrieval domain at Meltwater. David has previously presented talks in several international workshops and conferences and more recently (Feb’ 2017) hosted a machine learning meetup at WideOrbit’s Gothenburg office. Machine learning in information retrieval is an exciting area and one that Meltwater want to be at the forefront of. This talk will present knowledge on what we are doing on one specific project, but with potentially several more ML projects in the pipeline, we see Meltwater becoming an active participant in the Gothenburg machine learning community. We want to share our problems and solutions in order to generate discussions, build relationships and continuously learn and improve in what we do.