Back to All Events

Scalable Forecasting in Google Cloud

Abstract

This talk will be a practical and data engineering-oriented talk on how to stand up a scalable and fully automated pipeline for time series forecasting using Facebook’s Prophet library and the standard big data and machine learning tools in Google Cloud.

I will do this in the context of Einride’s data platform, which is all about creating actionable insights that drive customers toward sustainable transport. As a real-world case study, I will show how we break down and understand transport demand in multiple dimensions – from a customer’s total demand down to thousands of individual sites and shipping lanes.

I will walk us through the whole pipeline, extracting a production database dump, multiple tiers of data cleaning and transformation, using PySpark and Dataproc to parallelize model training and forecast generation and orchestrate it all with Apache Airflow.

The key takeaway (and what’s really exciting) is how easy to use and available big data and machine learning tools have become – to tech giants and fledgling startups alike!

Oscar Söderlund

Chief Software Architect @ Einride

In 2018, Oscar left a cushy backend and data engineering job at Spotify to seek the thrill of a New Game+ experience from Gothenburg’s startup scene. Cue Einride, a crack team of technologists, set out to disrupt an outdated industry with sustainable transport solutions. Oscar has lately been working on building up Einride’s data platform capabilities and will share practical advice on building scalable data pipelines from scratch in Google Cloud.