Skip to content

Invalid Input Error: There is no query result #70

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
kimmolinna opened this issue Mar 14, 2025 · 4 comments
Open

Invalid Input Error: There is no query result #70

kimmolinna opened this issue Mar 14, 2025 · 4 comments

Comments

@kimmolinna
Copy link

kimmolinna commented Mar 14, 2025

import duckdb
import quak
df = duckdb.sql("from read_parquet('https://github.com/uwdata/mosaic/raw/main/data/athletes.parquet')")
quak.Widget(df)

If I try to run the last command the second time in Jupyter Notebook/Marimo I will get the following error message:

in Widget.__init__(self, data, table)
     52     arrow_table = data.to_arrow()
     53 elif has_pycapsule_stream_interface(data):
     54     # NOTE: for now we materialize the input into an in-memory Arrow table,
     55     # so that we can perform repeated queries on that. In the future, it may
     56     # be better to keep this Arrow stream non-materalized in Python and
     57     # create a new DuckDB table from the stream.
     58     # arrow_table = pa.RecordBatchReader.from_stream(data)
---> 59     arrow_table = pa.table(data)
     60 elif is_arrow_ipc(data):
     61     arrow_table = arrow_table_from_ipc(data)

File ...\site-packages\pyarrow\table.pxi:6159, in pyarrow.lib.table()

My quak version is 0.2.2, pyarrow 19.0.1 and Python 3.13.2. Operating system is a managed Windows 11.

@kimmolinna
Copy link
Author

Easy workaround is to use a polars dataframe. df = duckdb.sql("from read_parquet('https://github.com/uwdata/mosaic/raw/main/data/athletes.parquet')").pl()

@kylebarron
Copy link
Collaborator

This is a known "problem" with the pycapsule interface, where DuckDB exposes an arrow stream, but that stream can only be consumed once. There's still discussion about this upstream, but it's currently intentional that the last line would fail the second time you run it, because the query has already been consumed.

@manzt
Copy link
Owner

manzt commented Mar 14, 2025

Some more discussion here:

One option to support out of core use is to create a view of the data source and pass in your own duckdb connection:

import duckdb
import quak

conn = duckdb.connect(":memory:")
conn.sql("CREATE VIEW df AS SELECT * FROM 'https://github.com/uwdata/mosaic/raw/main/data/athletes.parquet'")
quak.Widget(conn, table="df")

Although, with the remote dataset the latency here isn't very good and creating a TABLE in memory is probably the best user experience. With local disk views for larger parquet datasets works quite well.

wget https://github.com/uwdata/mosaic/raw/main/data/athletes.parquet
import duckdb
import quak

conn = duckdb.connect(":memory:")
conn.sql("CREATE VIEW df AS SELECT * FROM 'athletes.parquet'")
quak.Widget(conn, table="df")

@kimmolinna
Copy link
Author

I'm not sure what @mscolnick did with marimo, but If I have two different cells

import duckdb
import quak

df = duckdb.sql("""from read_parquet("https://github.com/uwdata/mosaic/raw/main/data/athletes.parquet")""")
widget = quak.Widget(df) 

and the second one is just

widget

I don't have any problem running cells. BTW, I really like the command widget.data() for quick view.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants