Zenseact is advancing towards an autonomous driving (AD) stack based on deep learning (DL) models that perform end-to-end sensor and temporal fusion. Training these models demands a significant computational budget that scales with in-vehicle compute capacity. DL models tend to only operate safely within the domain of their training data. Meanwhile, annotating multi-second data sequences is vastly more expensive than single images, posing challenges to expanding the operational domain of next-generation AD systems.
Manual annotation is insufficient to meet these demands. Pseudo-annotations —algorithmically generated annotations— offer a solution, as they are not constrained by real-time or embedded hardware requirements. This enables the use of very large models. To train such models without increasing annotation needs, we employ self-supervised learning, training models to understand relationships in input data. Scaling this approach to large datasets and model sizes produces systems highly capable of solving downstream tasks. In summary, by increasingly leveraging a large compute capacity, the human annotation effort can be offloaded.
Ensuring the safety of DL models is another challenge that might only be tackled by simulating sensor data to evaluate performance under adverse conditions. Neural Radiance Fields (NeRFs) and 3D Gaussian splatting are emerging as critical technologies for enabling and scaling these simulations.
I completed my PhD in particle physics at CERN's CMS experiment, focusing on leveraging deep learning to enhance searches for new physics. Seeking a career with real-world impact, I transitioned to deep learning engineering at Zenseact, initially specializing in object detection. Later on, I helped build the foundation for Zenseact's next-generation deep learning models that do sensor and temporal fusion. This transition led to the creation of a new "Deep Learning Data Enrichment" area, which I now lead. This area aims to automate annotation processes with large offline models and mine high-value data.