Retrieval-augmented generation (RAG) systems integrate external information sources with advanced large language models (LLMs), promising more factual and context-aware outputs. However, assessing their performance remains a significant challenge—it involves assessing not only generated text quality but also the relevance and correctness of retrieved information. Recent developments have introduced new frameworks and metrics, but the field has not converged on widely accepted standards for robust evaluation.
In this talk, we explore current approaches to RAG evaluation and discuss key open questions from ongoing research in this area. Our goal is to encourage more systematic approaches for measuring and comparing RAG performance in a continually advancing field.
Yue is a Machine Learning (ML) Engineer at Modulai, an ML consultancy company in Sweden. There she has worked on various projects in the healthcare, legal and finance sectors. Before joining Modulai, Yue’s PhD research at KTH focused on applying AI models to breast cancer risk assessment and detection in mammograms. Earlier, she pursued her Master’s in Computer Science at KTH, Sweden and TU Delft, the Netherlands.