Is duckdb a suitable alternative to polars? #58

afermg · 2024-12-13T21:45:41Z

The use-case is trommel. The only polars functionality it needs is column selection (pl.select), for which we use dtypes.

In theory we can replace it with duckdv's select_dtypes (or other selection methods). It may be slightly faster, but the important thing is that we shed a lot of dependencies and the methodology may become a portable way to deal with the 'remove metadata -> process data -> append metadata' workflow.

afermg · 2024-12-17T17:51:33Z

Implementation note: Converting a duckdb object to numpy is ~20% faster using numpy.array(list(X.fetchnumpy().values())) compared to X.pl().to_numpy().
Surprisingly there is no direct way to go from a duckdb table to a numpy array straight away, only to dictionaries of numpy arrays (per column of course).

afermg self-assigned this Dec 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is duckdb a suitable alternative to polars? #58

Is duckdb a suitable alternative to polars? #58

afermg commented Dec 13, 2024

afermg commented Dec 17, 2024

Is duckdb a suitable alternative to polars? #58

Is duckdb a suitable alternative to polars? #58

Comments

afermg commented Dec 13, 2024

afermg commented Dec 17, 2024