Back to blog
Jun 05, 2024
2 min read

DuckDB

Lightweight in-memory/disk OLAP database

DuckDB

DuckDB is currently my database of choice. It’s relatively new (2019), is simple to use, and powerful enough to do fun things with.

In contrast to SQLLite - older, OLTP, transaction heavy work - it is more OLAP designed for efficient reading and analytical work. ChatGPT tells me SQLLite is better for write-heavy workloads and efficient reading of individual rows, and uses a row-orientated storage format; but DuckDB uses columns, better suited for analysis and other read-heavy work. I have never needed to use SQLLite much so can’t say much more.

By default, it runs in-memory; add a filename, and it creates a persistent file. The Python API I’ve used (there are many, including Java) has a lot of support for file operations, data frames and generally plays well with data.

I’m using it right now for a little finances application, downloading bank statements, parsing them into a standard format, and then writing to DuckDB. We use it at work for pulling down data from S3 and querying it locally.