-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Insights: apache/datafusion
November 26, 2024 – December 26, 2024
Overview
Could not load contribution data
Please try again later
169 Pull requests merged by 64 people
-
Introduce LogicalPlan invariants, begin automatically checking them
#13651 merged
Dec 26, 2024 -
ci improvements, update protoc
#13876 merged
Dec 26, 2024 -
fix : use
get_record_batch_memory_size
for calculating RecordBatch memory size in topK#13906 merged
Dec 26, 2024 -
chore(deps): update sqllogictest requirement from 0.23.0 to 0.24.0
#13902 merged
Dec 26, 2024 -
Preserve constant values across union operations
#13805 merged
Dec 25, 2024 -
Introduce
UserDefinedLogicalNodeUnparser
for User-defined Logical Plan unparsing#13880 merged
Dec 25, 2024 -
Changed the url for downloading IMDB dataset from benchmark - Fixed Issue #13896
#13903 merged
Dec 25, 2024 -
fix case_column_or_null with nullable when conditions
#13886 merged
Dec 25, 2024 -
Support unparsing implicit lateral
UNNEST
plan to SQL text#13824 merged
Dec 25, 2024 -
Prepare for 44.0.0 release: version and changelog
#13882 merged
Dec 25, 2024 -
Minor: Avoid emitting empty batches in partial sort
#13895 merged
Dec 25, 2024 -
Fix visibility of
swap_hash_join
to bepub
#13899 merged
Dec 24, 2024 -
Fix
recursive-protection
feature flag#13887 merged
Dec 24, 2024 -
Minor: change visibility of hash join utils
#13893 merged
Dec 24, 2024 -
Minor: change the sort merge join emission as incremental
#13894 merged
Dec 24, 2024 -
Support (order by / sort) for DataFrameWriteOptions
#13874 merged
Dec 24, 2024 -
Support 1 or 3 arg in generate_series() UDTF
#13856 merged
Dec 24, 2024 -
minor: fix typos in comments / structure names
#13879 merged
Dec 23, 2024 -
chore: Consolidate Example: simplify_udwf_expression.rs into advanced_udwf.rs
#13883 merged
Dec 23, 2024 -
Improve error messages for incorrect zero argument signatures
#13881 merged
Dec 23, 2024 -
Restore
DocBuilder::new()
to avoid breaking API change#13870 merged
Dec 22, 2024 -
Consolidate Example: dataframe_output.rs into dataframe.rs
#13877 merged
Dec 22, 2024 -
Minor: remove unused async-compression
futures-io
feature#13875 merged
Dec 22, 2024 -
[minor] make recursive package dependency optional
#13778 merged
Dec 22, 2024 -
Support unicode character for
initcap
function#13752 merged
Dec 22, 2024 -
Add documentation for
SHOW FUNCTIONS
#13868 merged
Dec 21, 2024 -
Minor: improve error message when ARRAY literals can not be planned
#13859 merged
Dec 21, 2024 -
Minor: fix: Include FetchRel when producing LogicalPlan from Sort
#13862 merged
Dec 21, 2024 -
feat(substrait): modular substrait consumer
#13803 merged
Dec 21, 2024 -
Fix build
use of undeclared type ShowStatementFilter
#13869 merged
Dec 21, 2024 -
Update bzip2 requirement from 0.4.3 to 0.5.0
#13740 merged
Dec 21, 2024 -
Implement
SHOW FUNCTIONS
#13799 merged
Dec 21, 2024 -
Minor: Unify
downcast_arg
method#13865 merged
Dec 21, 2024 -
Improve SortPreservingMerge::enable_round_robin_repartition docs
#13826 merged
Dec 20, 2024 -
feat(function): add
least
function#13786 merged
Dec 20, 2024 -
Minor: Use
resize
instead ofextend
for static values in SMJ logic#13861 merged
Dec 20, 2024 -
Upgrade to sqlparser
0.53.0
#13767 merged
Dec 20, 2024 -
feat: support normalized expr in CSE
#13315 merged
Dec 20, 2024 -
Improve
Signature
andcomparison_coercion
documentation#13840 merged
Dec 20, 2024 -
Add configurable normalization for configuration options and preserve case for S3 paths
#13576 merged
Dec 20, 2024 -
fix: enable DF's nested_expressions feature by in datafusion-substrait tests to make them pass
#13857 merged
Dec 20, 2024 -
replace CASE expressions in predicate pruning with boolean algebra
#13795 merged
Dec 20, 2024 -
Preserve ordering equivalencies on
with_reorder
#13770 merged
Dec 20, 2024 -
Replace
execution_mode
withemission_type
andboundedness
#13823 merged
Dec 20, 2024 -
Support n-ary monotonic functions in ordering equivalence
#13841 merged
Dec 20, 2024 -
MINOR: typo -- remove extra "`" interfering with doc formatting
#13847 merged
Dec 19, 2024 -
typo: remove extraneous "`" in doc comment, fix header
#13848 merged
Dec 19, 2024 -
Update substrait requirement from 0.49 to 0.50
#13808 merged
Dec 19, 2024 -
Add example of interacting with a remote catalog
#13722 merged
Dec 19, 2024 -
[bugfix] ScalarFunctionExpr does not preserve the nullable flag on roundtrip
#13830 merged
Dec 19, 2024 -
Rename
TypeSignature::NullAry
-->TypeSignature::Nullary
and improve comments#13817 merged
Dec 19, 2024 -
Minor: Replace
BooleanArray::extend
withappend_n
#13832 merged
Dec 19, 2024 -
feat:
parse_float_as_decimal
supports scientific notation and Decimal256#13806 merged
Dec 19, 2024 -
chore: temporarily disable Windows Rust flow
#13833 merged
Dec 19, 2024 -
Minor: Extend ScalarValue::new_zero()
#13828 merged
Dec 18, 2024 -
Support 'NULL' as Null in csv parser.
#13228 merged
Dec 18, 2024 -
Handle possible overflows in StringArrayBuilder / LargeStringArrayBuilder
#13802 merged
Dec 18, 2024 -
Chore: Do not return empty record batches from streams
#13794 merged
Dec 18, 2024 -
Fix get_type for higher-order array functions
#13756 merged
Dec 18, 2024 -
Handle empty rows for
array_distinct
#13810 merged
Dec 17, 2024 -
fix: pruning by bloom filters for dictionary columns
#13768 merged
Dec 17, 2024 -
Fix
ScalarValue::to_array_of_size
for DenseUnion#13797 merged
Dec 17, 2024 -
Minor: cargo update in datafusion-cli
#13801 merged
Dec 17, 2024 -
Minor: improve
Analyzer
docs#13798 merged
Dec 16, 2024 -
fix: Limit together with pushdown_filters
#13788 merged
Dec 16, 2024 -
Add Round trip tests for Array <--> ScalarValue
#13777 merged
Dec 16, 2024 -
Update documentation guidelines for contribution content
#13703 merged
Dec 16, 2024 -
[minor] add missing slt tests for count(partitioned,aggregated, aggregated cube)
#13790 merged
Dec 16, 2024 -
Revert the removal of reservation in HashJoin
#13792 merged
Dec 16, 2024 -
fix: add
null_buffer
length check toStringArrayBuilder
/LargeStringArrayBuilder
#13758 merged
Dec 15, 2024 -
Improve Deprecation Guidelines more
#13776 merged
Dec 15, 2024 -
docs: update GroupsAccumulator instead of GroupAccumulator
#13787 merged
Dec 15, 2024 -
Minor: Add some more blog posts to the readings page
#13761 merged
Dec 14, 2024 -
Simplify type signatures using
TypeSignatureClass
for mixed type function signature#13372 merged
Dec 14, 2024 -
fix: specify roottype in substrait fieldreference
#13647 merged
Dec 13, 2024 -
Minor: improve the Deprecation / API health guidelines
#13701 merged
Dec 13, 2024 -
fix: Implicitly plan
UNNEST
as lateral#13695 merged
Dec 13, 2024 -
chore: clean up dependencies
#13728 merged
Dec 13, 2024 -
Update to bigdecimal 0.4.7
#13747 merged
Dec 13, 2024 -
Minor: Remove memory reservation in
JoinLeftData
used in HashJoin#13751 merged
Dec 13, 2024 -
doc-gen: migrate window functions documentation to attribute based
#13739 merged
Dec 13, 2024 -
Support sqllogictest --complete with postgres
#13746 merged
Dec 13, 2024 -
Minor: Add documentation explaining that initcap only works for ASCII
#13749 merged
Dec 13, 2024 -
Optimize performance of
initcap
function (~2x faster)#13691 merged
Dec 12, 2024 -
Add tests for date_part on columns + timestamps with / without timezones
#13732 merged
Dec 12, 2024 -
Minor: make unsupported
nanosecond
part a real (not internal) error#13733 merged
Dec 12, 2024 -
fix union serialisation order in proto
#13709 merged
Dec 12, 2024 -
Implement GroupsAccumulator for corr(x,y) aggregate function
#13581 merged
Dec 12, 2024 -
Minor: Add doc example to RecordBatchStreamAdapter
#13725 merged
Dec 12, 2024 -
Update to apache-avro 0.17, fix compatibility changes schema handling
#13727 merged
Dec 12, 2024 -
Support unparsing
UNNEST
plan toUNNEST
table factor SQL#13660 merged
Dec 11, 2024 -
minor: Extract tests for
EXTRACT
ANDdate_part
to their own file#13731 merged
Dec 11, 2024 -
Reveal implementing type and return type in simple UDF implementations
#13730 merged
Dec 11, 2024 -
Improve documentation for TableProvider
#13724 merged
Dec 11, 2024 -
Handle alias when parsing sql(parse_sql_expr)
#12939 merged
Dec 11, 2024 -
refactor: replace
Vec
withIndexMap
for expression mappings inProjectionMapping
andEquivalenceGroup
#13675 merged
Dec 11, 2024 -
refactor: simplify the
make_udf_function
macro#13712 merged
Dec 11, 2024 -
Minor: Output elapsed time for sql logic test
#13718 merged
Dec 10, 2024 -
Update prost-build requirement from =0.13.3 to =0.13.4
#13698 merged
Dec 10, 2024 -
Optimize performance of
character_length
function#13696 merged
Dec 10, 2024 -
chore: reinstate down_cast_any_ref
#13705 merged
Dec 9, 2024 -
Improve substr() performance by avoiding using owned string
#13688 merged
Dec 9, 2024 -
Fix hash join with sort push down
#13560 merged
Dec 9, 2024 -
[minor]: Simplifications
#13697 merged
Dec 9, 2024 -
Unlock lexical-write-integer version.
#13693 merged
Dec 9, 2024 -
refactor: use
LazyLock
in theuser_doc
macro#13684 merged
Dec 8, 2024 -
Temporary fix for CI
#13689 merged
Dec 8, 2024 -
Performance: enable array allocation reuse (
ScalarFunctionArgs
gets ownedColumnReference
)#13637 merged
Dec 8, 2024 -
Refactor regexplike signature
#13394 merged
Dec 8, 2024 -
chore: macros crate cleanup
#13685 merged
Dec 7, 2024 -
fix: repartitioned reads of CSV with custom line terminator
#13677 merged
Dec 7, 2024 -
Minor: Rephrase MSRV policy to be more explanatory
#13668 merged
Dec 6, 2024 -
Minor: Comment temporary function for documentation migration
#13669 merged
Dec 6, 2024 -
refactor: change some
hashbrown
RawTable
uses toHashTable
(round 3)#13658 merged
Dec 6, 2024 -
refactor: replace
OnceLock
withLazyLock
(round 2)#13674 merged
Dec 6, 2024 -
Increase minimum supported Rust version (MSRV) to 1.80.1
#13667 merged
Dec 5, 2024 -
chore: Minor code improvements suggested by newer clippy
#13666 merged
Dec 5, 2024 -
Doc gen: Migrate aggregate functions doc to attribute based.
#13646 merged
Dec 5, 2024 -
Retract IndexSet, IndexMap type alias
#13655 merged
Dec 5, 2024 -
fix: cargo msrv check failed
#13654 merged
Dec 5, 2024 -
Add csv loading benchmarks.
#13544 merged
Dec 5, 2024 -
test: support run filter_pushdown on windows machine
#13610 merged
Dec 5, 2024 -
Allow place holders like
$1
in more types of queries.#13632 merged
Dec 5, 2024 -
Minor: add examples for using
displayable
to showExecutionPlans
#13636 merged
Dec 5, 2024 -
Report current function name when invoke result length wrong
#13643 merged
Dec 5, 2024 -
Update governance page for 7 days of voting
#13629 merged
Dec 4, 2024 -
Deprecate
RuntimeConfig
, update code to use new builder style#13635 merged
Dec 4, 2024 -
Allow ColumnarValue to array conversion with less copying
#13644 merged
Dec 4, 2024 -
refactor: replace
OnceLock
withLazyLock
#13641 merged
Dec 4, 2024 -
refactor: change some
hashbrown
RawTable
uses toHashTable
(round 2)#13524 merged
Dec 4, 2024 -
Report current operation when coercion fails
#13628 merged
Dec 4, 2024 -
allow http in datafusion-cli http object store
#13626 merged
Dec 4, 2024 -
fix: CI build failed on main
#13640 merged
Dec 4, 2024 -
Improve unparsing after optimize_projections optimization
#13599 merged
Dec 4, 2024 -
[minor]: Introduce IndexSet and IndexMap aliases.
#13611 merged
Dec 4, 2024 -
feat(substrait): remove dependency on datafusion default features
#13594 merged
Dec 4, 2024 -
Create
ArrayScalarBuilder
for creating single element List arrays#13623 merged
Dec 4, 2024 -
feat: Add GroupColumn
Decimal128Array
#13564 merged
Dec 4, 2024 -
[minor] Consolidate construction of the list field
#13627 merged
Dec 4, 2024 -
Increase minimum supported Rust version (MSRV) to 1.80
#13622 merged
Dec 3, 2024 -
[refactor]: Convert Vec<PhysicalExpr> to HashSet<PhysicalExpr>
#13612 merged
Dec 3, 2024 -
Add generate_series() udtf (and introduce 'lazy'
MemoryExec
)#13540 merged
Dec 2, 2024 -
[minor] Fix logo image path by using absolute url
#13619 merged
Dec 2, 2024 -
Minor: Simplify
IdentTaker
#13609 merged
Dec 1, 2024 -
Improve unsupported compound identifier message
#13605 merged
Nov 30, 2024 -
support unknown col expr in proto
#13603 merged
Nov 30, 2024 -
chore: exposing ArraySize and ArrayFlatten
#13600 merged
Nov 30, 2024 -
Fix
LogicalPlan::..._with_subqueries
methods#13589 merged
Nov 30, 2024 -
Use // for unparsing DuckDB division operator
#13509 merged
Nov 30, 2024 -
Add SimpleScalarUDF::new_with_signature
#13592 merged
Nov 30, 2024 -
[minor]: Update median implementation
#13554 merged
Nov 29, 2024 -
Tidy up join test code
#13604 merged
Nov 29, 2024 -
feat(substrait): support-try-cast
#13562 merged
Nov 29, 2024 -
Test sort merge join on TPC-H benchmark
#13572 merged
Nov 29, 2024 -
refactor: add
get_available_parallelism
function#13595 merged
Nov 29, 2024 -
Apply clippy fixes for Rust 1.83
#13596 merged
Nov 29, 2024 -
Minor: Add example of backporting /
cherry-pick
ing to release branch#13565 merged
Nov 28, 2024 -
Temporarily pin toolchain version to avoid clippy
#13598 merged
Nov 28, 2024 -
[Minor] Use std::thread::available_parallelism instead of
num_cpus
#13579 merged
Nov 28, 2024 -
chore: rename known project ZincObserve to OpenObserve
#13587 merged
Nov 28, 2024 -
Supplement as_*_array functions
#13580 merged
Nov 28, 2024 -
Remove redundant type constraints from ScalarUDF from
#13578 merged
Nov 28, 2024 -
Deprecate
adjust_output_array
in favor ofPrimitiveArray::with_data_type
#13585 merged
Nov 28, 2024 -
Doc gen: Attributes to support
related_udf
,alternative_syntax
#13575 merged
Nov 27, 2024 -
chore(deps): update bigdecimal from 0.4.1 to 0.4.6
#13569 merged
Nov 27, 2024 -
feat: Add
Boolean
Column Support for Window Functions#13577 merged
Nov 27, 2024 -
Fix Duplicated filters within (filter(TableScan)) plan for unparser
#13422 merged
Nov 27, 2024 -
Add zero-
decimal
-cast test#13571 merged
Nov 27, 2024
38 Pull requests opened by 30 people
-
Implement RightSemi join for SortMergeJoin
#13584 opened
Nov 27, 2024 -
Test for string / numeric coercion
#13606 opened
Nov 29, 2024 -
[POC] Fuse operations in `equal_rows_arr`
#13607 opened
Nov 29, 2024 -
Replace is_sorted helper with standard one.
#13608 opened
Nov 30, 2024 -
POC: Eliminate unnecessary group by keys (q35 in clickbench 1.35x faster)
#13617 opened
Dec 1, 2024 -
Add `LogicalPlanStats` to logical plan nodes
#13618 opened
Dec 1, 2024 -
add cross rt execution code
#13634 opened
Dec 3, 2024 -
WIP: fix regression after replacing `Vec<PhysicalExpr>` with `HashSet<PhysicalExpr>`
#13656 opened
Dec 5, 2024 -
WIP Upgrade to arrow-rs/parquet `54.0.0`
#13663 opened
Dec 5, 2024 -
Add related source code locations to errors
#13664 opened
Dec 5, 2024 -
Add snapshot testing to CLI & set up AWS mock
#13672 opened
Dec 5, 2024 -
feat: support `RightAnti` for `SortMergeJoin`
#13680 opened
Dec 7, 2024 -
Support specific `GroupsAccumulator` for `median`
#13681 opened
Dec 7, 2024 -
Make scalar and array handling for array_has consistent
#13683 opened
Dec 7, 2024 -
PoC Adaptive round robin repartitioning
#13699 opened
Dec 9, 2024 -
Document SQL dialect guidance
#13706 opened
Dec 9, 2024 -
Always add round robin repartitioning to leaves (data sources), benefitting unbalanced / small datasets
#13707 opened
Dec 9, 2024 -
chore: reinstate find_df_window_func
#13708 opened
Dec 9, 2024 -
Deprecate ScalarUDFImpl::return_type
#13717 opened
Dec 10, 2024 -
[POC] Try to plan ast::Expr::CompoundFieldAccess syntax
#13734 opened
Dec 11, 2024 -
Add sum statistics and PhysicalExpr::column_statistics
#13736 opened
Dec 11, 2024 -
Support binary temporal arithmetic with integers
#13741 opened
Dec 12, 2024 -
Round floats but not decimals in SqlLogicTests
#13743 opened
Dec 12, 2024 -
[substrait] Add support for ExtensionTable
#13772 opened
Dec 13, 2024 -
Feature scalar regexp match benchmark
#13789 opened
Dec 15, 2024 -
verify TPC-DS results
#13791 opened
Dec 16, 2024 -
feat: add `AsyncCatalogProvider` helpers for asynchronous catalogs
#13800 opened
Dec 16, 2024 -
chore: Migration Guide
#13849 opened
Dec 19, 2024 -
ParquetSink should be aware of arrow schema encoding for the file metadata.
#13866 opened
Dec 21, 2024 -
Require all zero argument UDFs use `Signature::Nullary`, improve error messages
#13871 opened
Dec 21, 2024 -
doc-gen: migrate builtin scalar functions documentation to attribute based
#13878 opened
Dec 22, 2024 -
Add substrait tpch round trip tests from sql query
#13888 opened
Dec 23, 2024 -
chore(deps): update parquet requirement from 53.3.0 to 54.0.0
#13892 opened
Dec 24, 2024 -
Implement maintains_input_order for AggregateExec
#13897 opened
Dec 24, 2024 -
Consolidate example: simplify_udaf_expression.rs into advanced_udaf.rs
#13905 opened
Dec 26, 2024 -
Correct return type for initcap scalar function with utf8view
#13909 opened
Dec 26, 2024 -
Make it easier to make optimizers: Move join input swapping and related methods into PhysicalOperators
#13910 opened
Dec 26, 2024 -
Consolidate Examples: memtable.rs and parquet_multiple_files.rs
#13913 opened
Dec 26, 2024
97 Issues closed by 19 people
-
Contemplate stop CI testing on intel mac
#13846 closed
Dec 26, 2024 -
sql result discrepency with sqlite and postgres
#13779 closed
Dec 26, 2024 -
Datafusion v19.rc1 scan parquet 20x slower than DuckDB v0.6.1 on 15GB ClickBench data
#5404 closed
Dec 26, 2024 -
Preserve constant values in union operations
#13804 closed
Dec 25, 2024 -
Support unparsing `LogicalPlan::Extension` to SQL tesxt
#13753 closed
Dec 25, 2024 -
Downloading IMDB dataset for benchmarks gives 404 Not Found
#13896 closed
Dec 25, 2024 -
Incorrect CASE WHEN + ELSE NULL behavior
#13885 closed
Dec 25, 2024 -
Support unparsing implicit lateral `UNNEST` plan to SQL text
#13793 closed
Dec 25, 2024 -
Test DataFusion 44.0.0 with Comet
#13835 closed
Dec 24, 2024 -
`swap_hash_join` is no longer public so comet doesn't compile
#13898 closed
Dec 24, 2024 -
Making the `recursive` dependency an optional feature
#13766 closed
Dec 24, 2024 -
inner join involving hive-partitioned parquet dataset and filters on LHS and RHS causes panic
#9797 closed
Dec 24, 2024 -
Support (order by / sort) for DataFrameWriteOptions
#13873 closed
Dec 24, 2024 -
Support 1 or 3 arg in `generate_series()` UDTF
#13615 closed
Dec 24, 2024 -
Panic when querying a hive-partitioned parquet dataset created with wrong column name
#10020 closed
Dec 23, 2024 -
Consolidate Example: simplify_udwf_expression.rs into advanced_udwf.rs #13842
#13843 closed
Dec 23, 2024 -
Make migration to `Signature::nullary` in 44.0.0 easier / less confusing
#13763 closed
Dec 23, 2024 -
Make `DocBuilder` migration in `44.0.0` easier
#13764 closed
Dec 22, 2024 -
Consolidate Example: dataframe_output.rs into dataframe.rs
#13844 closed
Dec 22, 2024 -
Test DataFusion 44.0.0 with delta.rs
#13834 closed
Dec 22, 2024 -
Regression in 43.0.0: coalesce no longer works between Utf8 and Utf8View columns
#13568 closed
Dec 22, 2024 -
Support unicode character for `initcap` function
#13711 closed
Dec 22, 2024 -
Rename / simplify `BuiltInWindowExpr` / `BuiltInWindowFunctionExpr`
#13473 closed
Dec 21, 2024 -
Substrait roundtrip fails for Sort with a fetch
#13860 closed
Dec 21, 2024 -
List available functions (`SHOW FUNCTIONS`)
#12144 closed
Dec 21, 2024 -
Add `greatest(T,...)` and `least(T,...)` SQL functions
#6531 closed
Dec 20, 2024 -
Support per-option value normalization
#11650 closed
Dec 20, 2024 -
Proposal: Restructure DataFusion site
#1821 closed
Dec 20, 2024 -
substrait_integration integration tests are failing
#13854 closed
Dec 20, 2024 -
Preserve ordering equivalencies on `with_reorder`
#13769 closed
Dec 20, 2024 -
Support n-ary monotonic functions in ordering equivalence
#13839 closed
Dec 20, 2024 -
support make_interval function
#6951 closed
Dec 19, 2024 -
Provide an example of using a remote catalog
#13714 closed
Dec 19, 2024 -
ScalarFunctionExpr does not preserve the nullable flag on serialization roundtrip
#13829 closed
Dec 19, 2024 -
Support any table nesting level in SQL queries (i.e `SELECT * FROM one.two.three.four.five`)
#13822 closed
Dec 19, 2024 -
Use `BooleanBuilder::append_n` to generate default values in filtered masks
#13144 closed
Dec 19, 2024 -
CSV can't parse null value for non-string type (i32, i64, float)
#12904 closed
Dec 18, 2024 -
Ignore empty (parquet) files when using ListingTable
#13737 closed
Dec 18, 2024 -
Theoretical integer overflow in `StringArrayBuilder` / `LargeStringArrayBuilder`
#13796 closed
Dec 18, 2024 -
`expr.get_type` (`ExprSchemable::get_type`) returns wrong type for array functions on nested lists
#13755 closed
Dec 18, 2024 -
`array_distinct` fails when input is empty
#13809 closed
Dec 17, 2024 -
Bloom filters don't work with Dictionary encoded columns
#13574 closed
Dec 17, 2024 -
Upgrade from 40 to 43 causes utf8 timestamp queries to fail
#13625 closed
Dec 17, 2024 -
Configuration Mutation Isolation
#4617 closed
Dec 17, 2024 -
Public some fields for about functions
#4029 closed
Dec 17, 2024 -
Improved support for "User Defined Catalogs"
#5291 closed
Dec 17, 2024 -
[EPIC] Full support `wasm32-unknown-unknown` target (aka web assembly)
#7651 closed
Dec 17, 2024 -
[DISCUSSION] More SqlLogicTest test coverage for queries, including join queries
#13470 closed
Dec 17, 2024 -
Schema error when returning DenseUnion from ScalarUDF
#13762 closed
Dec 17, 2024 -
Limit together with pushdown_filters
#13745 closed
Dec 16, 2024 -
Unnest relation can't accept a field from its join table
#13659 closed
Dec 16, 2024 -
StringArrayBuilder and LargeStringArrayBuilder don't check null buffer length
#13759 closed
Dec 15, 2024 -
"recursive" Dependency Causes "section too large" Error When Compiling for wasm
#13513 closed
Dec 14, 2024 -
Substrait: FieldReference not created correctly; missing a RootType
#13645 closed
Dec 13, 2024 -
Release DataFusion `42.0.0`
#11902 closed
Dec 13, 2024 -
doc-gen: Migrate windows functions from code based documentation to attribute based
#13670 closed
Dec 13, 2024 -
Improve performance of `corr` function
#13549 closed
Dec 12, 2024 -
Allow to unparse `UNNEST` plan back to a table function SQL text
#13601 closed
Dec 11, 2024 -
DataFrame parse_sql_expr does not handle aliases
#12518 closed
Dec 11, 2024 -
Use HashMap to store Arc<dyn PhysicalExpr>
#8027 closed
Dec 11, 2024 -
December 2024 ASF Board Report
#10157 closed
Dec 11, 2024 -
Improve substr() performance by avoiding using owned string
#13687 closed
Dec 9, 2024 -
[Epic] Prepared Statement Support
#4539 closed
Dec 9, 2024 -
An error occurred when the sort push down rule pushed sort below join
#13559 closed
Dec 9, 2024 -
Retry logic in ParquetSink
#13679 closed
Dec 9, 2024 -
CI failed due to the dependency `lexical-write-integer` upgrade
#13686 closed
Dec 9, 2024 -
Perf: Allow User defined functions to potentially reuse their argument arrays (to avoid new allocations)
#13516 closed
Dec 8, 2024 -
The file with non-standard newline character can't be read when sqllogictests testing
#12328 closed
Dec 7, 2024 -
Optimizing `LogicalPlan` with placeholders fails
#8819 closed
Dec 6, 2024 -
Release Minor DataFusion 43.1.0 release
#13499 closed
Dec 5, 2024 -
Enhance SortMergeJoin to support Join filters(non-equal Join conditions)
#4364 closed
Dec 5, 2024 -
Support optional filter in SortMergeJoin
#2628 closed
Dec 5, 2024 -
CI: cargo msrv check failed
#13653 closed
Dec 5, 2024 -
Support parameter (`$1`) type inference for `LIKE` predicates
#5617 closed
Dec 5, 2024 -
SQL on multiple parquet files doesn't work (returns ++ instead of result)
#6732 closed
Dec 4, 2024 -
Trigger the Sort Merge Join benchmark using the GitHub action
#10109 closed
Dec 4, 2024 -
Review Backlog and Plan - Andrew Lamb - Nov 2024
#13386 closed
Dec 4, 2024 -
CI build failed on main
#13639 closed
Dec 4, 2024 -
[substrait] make dependency on parquet optional
#13593 closed
Dec 4, 2024 -
Implement GroupColumn Decimal128Array
#13505 closed
Dec 4, 2024 -
Evaluate vectorized hash table for group aggregation
#7095 closed
Dec 3, 2024 -
Nov 20. 2024: This week in DataFusion
#13503 closed
Dec 3, 2024 -
Can't write record batch after upgrade from 40 to 43 due to Utf8View incompatibility
#13624 closed
Dec 3, 2024 -
suggest add synax `select from generate_series()`
#10069 closed
Dec 2, 2024 -
crates.io page has broken logo
#13526 closed
Dec 2, 2024 -
Remove unnecessary null checks in `GroupColumn`s
#12944 closed
Dec 1, 2024 -
[substrait] support try_cast
#13419 closed
Nov 29, 2024 -
Test TPCH with sort merge join
#13573 closed
Nov 29, 2024 -
Move available_parallelism() into utility function
#13591 closed
Nov 29, 2024 -
Fix build issues on latest stable Rust toolchain (1.83)
#13597 closed
Nov 29, 2024 -
Documentation: Add `related_udfs` and `alternative_syntax` to doc gen macros
#13553 closed
Nov 27, 2024 -
[TESTS] Fix sqllogictests to support bigdecimal 0.4.3 crate
#10001 closed
Nov 27, 2024 -
Using row_number() with ordering on boolean columns produces an internal error
#13566 closed
Nov 27, 2024 -
Cannot cast varchar `'0'` to `decimal(p,0)`
#12870 closed
Nov 27, 2024 -
Excessive memory consumption on sorting
#10511 closed
Nov 26, 2024
84 Issues opened by 39 people
-
Consolidate Examples: memtable.rs and parquet_multiple_files.rs
#13912 opened
Dec 26, 2024 -
Consolidate Example: dataframe_subquery.rs into dataframe.rs
#13911 opened
Dec 26, 2024 -
Enhance documentation site to have versions
#13908 opened
Dec 26, 2024 -
initcap function expected return type does not match actual
#13907 opened
Dec 26, 2024 -
SELECT * FROM subquery ignores ordering
#13904 opened
Dec 25, 2024 -
[substrait] customizable producer
#13901 opened
Dec 24, 2024 -
Find a way to communicate the ordering of a file back with the existing listing table implementation
#13891 opened
Dec 24, 2024 -
Functionality of `array_repeat` udf
#13872 opened
Dec 21, 2024 -
Replace `BufferBuilder<u8>` with `Vec<u8>`
#13867 opened
Dec 21, 2024 -
[substrait] refactor consumer.rs
#13864 opened
Dec 20, 2024 -
[substrait] more abstract SubstraitConsumer API
#13863 opened
Dec 20, 2024 -
Test DataFusion 44.0.0 with Sail
#13855 opened
Dec 20, 2024 -
datafusion-substrait API docs on docs.rs are broken
#13853 opened
Dec 19, 2024 -
FFI Execution Plans that spawn threads panic
#13851 opened
Dec 19, 2024 -
Support multiply and divide on intervals
#13850 opened
Dec 19, 2024 -
Improve efficiency of CI checks (so we can add MORE!)
#13845 opened
Dec 19, 2024 -
Consolidate Example: simplify_udaf_expression.rs into advanced_udaf.rs
#13842 opened
Dec 19, 2024 -
[DISCUSS] Single Source `ExecutionPlan` Across All `TableProviders`
#13838 opened
Dec 19, 2024 -
Why does `PruningPredicate` reference a `row_count` for each column?
#13836 opened
Dec 19, 2024 -
OOM in `GroupedHashAggregateStream::group_aggregate_batch()`
#13831 opened
Dec 18, 2024 -
Add version checking to FFI crate
#13827 opened
Dec 18, 2024 -
Compute ScalarFunction properties including `return_type` and `nullable` on creation
#13825 opened
Dec 18, 2024 -
parquet RowGroup pruning for `Dictionary(Decimal)` type incorrect
#13821 opened
Dec 17, 2024 -
Cannot create a `List` of `FixedSizedList` in SQL
#13819 opened
Dec 17, 2024 -
Browser-accessible official DataFusion playground
#13818 opened
Dec 17, 2024 -
Datafusion binary size has been getting bigger
#13816 opened
Dec 17, 2024 -
[EPIC] A collection of tickets for improved WASM support in DataFusion
#13815 opened
Dec 17, 2024 -
Building project takes a *long* time (esp compilation time for `datafusion` core crate)
#13814 opened
Dec 17, 2024 -
[EPIC] A collection of items to improve developer / CI speed
#13813 opened
Dec 17, 2024 -
Complete / integrate sqlite sqllogictest test scripts integrattion
#13812 opened
Dec 17, 2024 -
[EPIC] Run full sqllogic / sqlite test suite against DataFusion
#13811 opened
Dec 17, 2024 -
Convert sort to partial_sort when the table is unbounded
#13807 opened
Dec 17, 2024 -
2gb parquet file takes 100s to process, even on second attempt (on main)
#13785 opened
Dec 14, 2024 -
sql result discrepency with sqlite, postgres and duckdb bug #3
#13784 opened
Dec 14, 2024 -
sql result discrepency with sqlite, postgres and duckdb bug #2
#13782 opened
Dec 14, 2024 -
sql odd case of rounding compared to duckdb and postgresql
#13781 opened
Dec 14, 2024 -
sql result discrepency with sqlite, postgres and duckdb
#13780 opened
Dec 14, 2024 -
multiply overflow in stats.rs
#13775 opened
Dec 13, 2024 -
Natural Join Column Qualification Conflict
#13774 opened
Dec 13, 2024 -
Add mechanism to allow using custom ParquetFileReaderFactory from the ParquetFormat options
#13773 opened
Dec 13, 2024 -
[substrait] Add support for ExtensionTable
#13771 opened
Dec 13, 2024 -
Improve join performance for h2o queries
#13765 opened
Dec 13, 2024 -
Dec 13, 2024: This week(s) in DataFusion
#13760 opened
Dec 13, 2024 -
`flatten` should be single-step, not recursive
#13757 opened
Dec 13, 2024 -
Optimize `ColumnarValue::into_array` / `ScalarValue::to_array` / `ScalarValue::to_array_of_size`
#13754 opened
Dec 13, 2024 -
Exponential planning time (100s of seconds) with `UNION` and `ORDER BY` queries
#13748 opened
Dec 12, 2024 -
Allow overriding SQL path base for benchmarks
#13744 opened
Dec 12, 2024 -
Allow to filter null in `array_agg`
#13742 opened
Dec 12, 2024 -
Another way to get the possible return type of function for information_schema
#13735 opened
Dec 11, 2024 -
Improve Aggregate with Limit
#13729 opened
Dec 11, 2024 -
CI: Windows flow takes 1.5h
#13726 opened
Dec 10, 2024 -
Sort out tests in `aggregate.slt`
#13723 opened
Dec 10, 2024 -
Avoid explicit cast during execution in `corr` aggregate function
#13721 opened
Dec 10, 2024 -
FileFormat API makes it hard to avoid round trips
#13720 opened
Dec 10, 2024 -
Public API `ScalarUDFImpl.return_type` returns internal error in some cases
#13716 opened
Dec 10, 2024 -
Blog / Example of how to compile DataFusion to WASM
#13715 opened
Dec 10, 2024 -
March 2025 ASF Board Report
#13713 opened
Dec 10, 2024 -
Document the SQL dialect DataFusion attempts to follow
#13704 opened
Dec 9, 2024 -
Write "upgrade guide" for DataFusion 44.0.0
#13702 opened
Dec 9, 2024 -
Make SqlToRel respect parser options from ContextProvider
#13700 opened
Dec 9, 2024 -
Move CPU Bound Tasks off Tokio Threadpool
#13692 opened
Dec 8, 2024 -
`array_has` has inconsistent null handling for scalars and arrays
#13682 opened
Dec 7, 2024 -
LogicalPlan::get_parameter_types fails to return all placeholders
#13678 opened
Dec 6, 2024 -
Report multiple errors, not just the first one
#13676 opened
Dec 6, 2024 -
Possible hidden schema mismatch for HashJoin in ProjectionExec
#13673 opened
Dec 6, 2024 -
doc-gen: Migrate builtin scalar functions from code based documentation to attribute based
#13671 opened
Dec 5, 2024 -
Use `cargo semver-checks` for release testing
#13665 opened
Dec 5, 2024 -
Add related source code locations to errors
#13662 opened
Dec 5, 2024 -
[DISCUSS] More extensive pre-release testing
#13661 opened
Dec 5, 2024 -
Automatically check "invariants"
#13652 opened
Dec 5, 2024 -
More sensible handling of selecting camelcase columns
#13649 opened
Dec 4, 2024 -
[EPIC] A collection of items to improve DataFuson stability (reduce effort required to upgrade)
#13648 opened
Dec 4, 2024 -
Optimize SortPreservingMergeStream for single-column merge
#13642 opened
Dec 4, 2024 -
Planning performance regression after replacing `Vec<PhysicalExpr>` with `HashSet<PhysicalExpr>`.
#13638 opened
Dec 4, 2024 -
CSV import doesn't support date parsing
#13633 opened
Dec 3, 2024 -
Test Rust 2024 Edition
#13631 opened
Dec 3, 2024 -
Dec 3. 2024: This week in DataFusion
#13630 opened
Dec 3, 2024 -
Unparse `UNION` plan with multiple inputs to SQL text
#13621 opened
Dec 2, 2024 -
LogicalPlan serde is not yet implemented for Dml
#13616 opened
Dec 1, 2024 -
Simplify `LazyMemoryExec` with `SendableRecordBatchStream`
#13614 opened
Dec 1, 2024 -
Refactor `TableFunctionImpl` to a separate module
#13613 opened
Dec 1, 2024 -
Improve performance of db-benchmark query 8
#13586 opened
Nov 27, 2024 -
Feature Request: Implement `MATCH_RECOGNIZE` for Advanced Pattern Matching
#13583 opened
Nov 27, 2024
77 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
Implement predicate pruning for `like` expressions (prefix matching)
#12978 commented on
Dec 23, 2024 • 13 new comments -
Reject CREATE TABLE/VIEW with duplicate column names
#13517 commented on
Dec 3, 2024 • 6 new comments -
Refactor signatures for lpad, rpad, left, and right
#13420 commented on
Dec 16, 2024 • 2 new comments -
[EPIC] Improve examples to make them easier to navigate
#11172 commented on
Dec 26, 2024 • 0 new comments -
Arrow schema is missing from the parquet metadata, for files written by ParquetSink.
#11770 commented on
Dec 26, 2024 • 0 new comments -
Replace `record_batch.get_array_memory_size()` in spilling operators
#13430 commented on
Dec 26, 2024 • 0 new comments -
Release DataFusion `44.0.0`
#13334 commented on
Dec 25, 2024 • 0 new comments -
Introduce a way to represent constrained statistics / bounds on values in Statistics
#8078 commented on
Dec 24, 2024 • 0 new comments -
[Epic] A Collection of Additional UTF8View support tickets
#13504 commented on
Dec 22, 2024 • 0 new comments -
Regression: `Invalid comparison operation: Utf8 == Utf8View` error during LEFT ANTI JOIN
#13510 commented on
Dec 22, 2024 • 0 new comments -
[DISCUSSION] Make it easier and faster to query remote files (S3, iceberg, etc)
#13456 commented on
Dec 21, 2024 • 0 new comments -
[Discuss] Release cadence / patch releases / Long Term Supported (lts) minor releases
#5269 commented on
Dec 20, 2024 • 0 new comments -
Add H2O.ai Database-like Ops benchmark to `dfbench`
#7209 commented on
Dec 19, 2024 • 0 new comments -
[EPIC] Additional Date/Time related open issues
#8282 commented on
Dec 19, 2024 • 0 new comments -
Running tests uses 50.1GB of disk space on Ubuntu
#11105 commented on
Dec 19, 2024 • 0 new comments -
[DISCUSSION] Making it easier to use DataFusion (lessons from GlareDB)
#13525 commented on
Dec 19, 2024 • 0 new comments -
Epic: Better / Improved Documentation, Tutorials and Examples
#7013 commented on
Dec 19, 2024 • 0 new comments -
ListingTable cannot handle partition evolution
#13270 commented on
Dec 19, 2024 • 0 new comments -
DataFusion should support casting strings such as "4e7" to decimal
#10315 commented on
Dec 18, 2024 • 0 new comments -
Extend DiskManager to manager other temp files like temp shuffle IPC files in Ballista, cached data files etc
#4564 commented on
Dec 17, 2024 • 0 new comments -
Support compiling remaining DataFusion crates (`datafusion-core`) to WASM
#7652 commented on
Dec 17, 2024 • 0 new comments -
Update hashbrown requirement from 0.14.5 to 0.15.2
#13557 commented on
Dec 25, 2024 • 0 new comments -
[POC] improve vectorized compare for primitives
#13539 commented on
Nov 28, 2024 • 0 new comments -
feat: Add ConfigOptions to ScalarFunctionArgs
#13527 commented on
Dec 23, 2024 • 0 new comments -
chore: Create devcontainer.json
#13520 commented on
Dec 14, 2024 • 0 new comments -
Support custom field metadata in UDF
#13458 commented on
Nov 27, 2024 • 0 new comments -
Add example for using a separate threadpool for CPU bound work
#13424 commented on
Dec 8, 2024 • 0 new comments -
feat: Add implicit casting to `TypeSignature::String`
#13404 commented on
Dec 4, 2024 • 0 new comments -
fix: Support `Utf8View` in `numeric_string_coercion`
#13366 commented on
Dec 17, 2024 • 0 new comments -
RFC: Add `Precision:AtLeast` and `Precision::AtMost` for more `Statistics`… precision
#13293 commented on
Dec 15, 2024 • 0 new comments -
improve eliminate_outer_join rule
#13249 commented on
Dec 9, 2024 • 0 new comments -
feat: Add regexp_split_to_array function
#13110 commented on
Dec 26, 2024 • 0 new comments -
Add Common Subexpression Elimination for `PhysicalExpr` trees
#13046 commented on
Nov 28, 2024 • 0 new comments -
Fix DISTINCT ON expressions match ORDER BY expressions check - normal…
#13039 commented on
Dec 22, 2024 • 0 new comments -
feat: support inner iejoin
#12754 commented on
Dec 23, 2024 • 0 new comments -
feat: scalar regex match physical expr
#12270 commented on
Dec 21, 2024 • 0 new comments -
Adding node_id to ExecutionPlanProperties
#12186 commented on
Dec 19, 2024 • 0 new comments -
Feat: Implement hf:// / "hugging face" integration in datafusion-cli
#10792 commented on
Dec 19, 2024 • 0 new comments -
Support Null aware anti join by HashJoin
#10584 commented on
Dec 21, 2024 • 0 new comments -
Document DataFusion Threading / tokio runtimes (how to separate IO and CPU bound work)
#12393 commented on
Dec 8, 2024 • 0 new comments -
[DISCUSSION] Challenge: Make DataFusion the fastest engine in ClickBench with custom file format
#13448 commented on
Dec 8, 2024 • 0 new comments -
Statistic: data_size should be in ColumnStatistics.
#7548 commented on
Dec 7, 2024 • 0 new comments -
[Epic] A Collection of Sort Based Optimizations
#10313 commented on
Dec 4, 2024 • 0 new comments -
Datafusion downcasts decimal loosing precision
#13492 commented on
Dec 4, 2024 • 0 new comments -
[Epic] Remove Sort Merge Join Experimental status
#9846 commented on
Dec 4, 2024 • 0 new comments -
Replace `OnceLock` with `LazyLock`
#11687 commented on
Dec 4, 2024 • 0 new comments -
Use Row Format in SortExec
#7053 commented on
Dec 4, 2024 • 0 new comments -
Unify stream deserialization
#13411 commented on
Dec 4, 2024 • 0 new comments -
[EPIC] Improvements to GroupColumn multi-column aggregation performance
#12680 commented on
Dec 4, 2024 • 0 new comments -
Empty strings in CSV files aren't being interpreted as null when using a `Dictionary(_, Utf8)`
#12041 commented on
Dec 3, 2024 • 0 new comments -
[EPIC] (Even More) Grouping / Group By / Aggregation Performance
#7000 commented on
Dec 3, 2024 • 0 new comments -
[DISCUSSION] 2024 Q4 / 2025 Q1 Roadmap
#13274 commented on
Dec 3, 2024 • 0 new comments -
Referencing a column from `select` and `order by` clauses triggers duplicate expression error
#13558 commented on
Dec 2, 2024 • 0 new comments -
Keep the original SQL for CreateExternalTable::definition
#12652 commented on
Dec 2, 2024 • 0 new comments -
Improve vectorized operations of `GroupColumn`
#13275 commented on
Nov 29, 2024 • 0 new comments -
Improve performance of `median` function
#13550 commented on
Nov 28, 2024 • 0 new comments -
Make all SchemaProvider trait APIs async
#10339 commented on
Nov 27, 2024 • 0 new comments -
Document PREPARE statements
#13570 commented on
Nov 27, 2024 • 0 new comments -
Support multiple (>2) results comparison in benchmark scripts
#13446 commented on
Dec 17, 2024 • 0 new comments -
Read string as Utf8View in sqllogictest
#13408 commented on
Dec 17, 2024 • 0 new comments -
Spatial data support
#7859 commented on
Dec 16, 2024 • 0 new comments -
[EPIC] Improved performance in H20.ai benchmarks
#13548 commented on
Dec 16, 2024 • 0 new comments -
Remove record_batch! macro once upstream updates
#13037 commented on
Dec 15, 2024 • 0 new comments -
Expressions should also evaluate on statistics
#992 commented on
Dec 15, 2024 • 0 new comments -
Proposal: introduced typed expressions, separate AST and IR
#12604 commented on
Dec 14, 2024 • 0 new comments -
Stack Overflow with Deeply Nested Filter Expressions
#8900 commented on
Dec 14, 2024 • 0 new comments -
Feature request: Support for lateral joins
#10048 commented on
Dec 13, 2024 • 0 new comments -
avro_to_arrow: Support in memory apache_avro Value's
#7690 commented on
Dec 12, 2024 • 0 new comments -
SQL/PGQ or even GQL support
#13545 commented on
Dec 12, 2024 • 0 new comments -
`JOIN` should require `ON` condition
#13486 commented on
Dec 12, 2024 • 0 new comments -
Create memory table with target partitions
#12905 commented on
Dec 12, 2024 • 0 new comments -
[BUG] Error when adding Date32 and Int64
#12342 commented on
Dec 12, 2024 • 0 new comments -
Improve performance of ClickBench Q18, Q35,
#13449 commented on
Dec 11, 2024 • 0 new comments -
Support Glob Expressions for S3
#7393 commented on
Dec 11, 2024 • 0 new comments -
Upgrade to hashbrown 0.15.1: migrate from `hashbrown::raw::RawTable` to `hashbrown::hash_table::HashTable`
#13433 commented on
Dec 10, 2024 • 0 new comments -
Improve RepartitionExec for better query performance
#7001 commented on
Dec 9, 2024 • 0 new comments -
Add `SessionConfig` reference to `ScalarFunctionArgs`
#13519 commented on
Dec 9, 2024 • 0 new comments