DataExpert.io Newsletter

DataExpert.io Newsletter

DuckDB benchmarked against Spark

You Don't Always Need A Sledgehammer

High Performance DE Newsletter's avatar
Zach Wilson's avatar
High Performance DE Newsletter and Zach Wilson
Sep 22, 2025
∙ Paid

Introduction

Apache Spark has been the de facto open source data processing for fifteen years. It was invented to solve a major problem that traditional data warehousing was not built to solve - processing massive amounts of data horizontally at scale (Zach used Spark to process 2000 TBs per day at Netflix), whether in a structured or semi-structured for…

User's avatar

Continue reading this post for free, courtesy of Zach Wilson.

Or purchase a paid subscription.
© 2025 Zach Wilson · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture