Lingo & GAIA: Transforming Autonomous Driving with Large Language Models and Generative AI
In this talk, I will present LINGO and GAIA, two recent models developed at Wayve. LINGO is a pioneering open-loop driving commentator that utilises natural language processing to interpret and articulate driving scenes. Its unique “show and tell” feature employs referential segmentation to visually highlight areas of interest within a scene, enhancing the model’s interaction with its environment. This capability significantly advances autonomous vehicles (AV) by improving accuracy in describing surroundings, addressing model hallucinations, and bolstering safety communication. By integrating vision, language, and action, LINGO represents a critical step towards Vision-Language-Action Models (VLAMs), aiming for human-like communication and trustworthiness in AV technology.
GAIA is an advanced generative world model designed to simulate realistic driving scenarios. It leverages video, text, and action inputs to build representations of the environment and its future dynamics, enhancing AV decision-making and safety. Comprising an image encoder and a 6.5 billion parameter autoregressive transformer trained on extensive driving data, GAIA showcases scalability and superior video generation quality. Its ability to adjust scene features like weather and time, alongside predicting diverse futures and interactions with other agents, positions GAIA as a valuable tool for AV development.
Remi Tachet des Combes is a senior applied scientist at Wayve, a UK-based startup specialising in the development of artificial intelligence systems for self-driving vehicles. There, he focuses on world modelling and representation learning for autonomy, with the ultimate goal of solving Embodied AI. Prior to Wayve, Remi was a principal researcher at Microsoft Research where he made several contributions to the fields of reinforcement learning and deep learning. Remi holds a PhD in applied mathematics, and has worked at the MIT Senseable City lab, studying the impact and benefits of technology on urban planning.