Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot create a List of FixedSizedList in SQL #13819

Open
alamb opened this issue Dec 17, 2024 · 9 comments
Open

Cannot create a List of FixedSizedList in SQL #13819

alamb opened this issue Dec 17, 2024 · 9 comments
Assignees
Labels
bug Something isn't working

Comments

@alamb
Copy link
Contributor

alamb commented Dec 17, 2024

Describe the bug

I expect that when I make a List / Array the element types remain the same

For example, a list of Int32 makes List(Int32)

> select arrow_typeof([1, 2, 3]);
+------------------------------------------------------------------------------------------------------------------+
| arrow_typeof(make_array(Int64(1),Int64(2),Int64(3)))                                                             |
+------------------------------------------------------------------------------------------------------------------+
| List(Field { name: "item", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }) |
+------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.001 seconds.

However, creating a List of FixedSizedList creates a List(List)) rather than a List(FixedSizedList)

> select arrow_typeof([arrow_cast([1,2,3], 'FixedSizeList(3, Int64)')]);
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| arrow_typeof(make_array(arrow_cast(make_array(Int64(1),Int64(2),Int64(3)),Utf8("FixedSizeList(3, Int64)"))))                                                                                                                |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| List(Field { name: "item", data_type: List(Field { name: "item", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }) |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.001 seconds.

To Reproduce

This creates a FixedSizeList

arrow_cast([1,2,3], 'FixedSizeList(3, Int64)')

You can see that here:

> select arrow_typeof(arrow_cast([1,2,3], 'FixedSizeList(3, Int64)'));
+------------------------------------------------------------------------------------------------------------------------------+
| arrow_typeof(arrow_cast(make_array(Int64(1),Int64(2),Int64(3)),Utf8("FixedSizeList(3, Int64)")))                             |
+------------------------------------------------------------------------------------------------------------------------------+
| FixedSizeList(Field { name: "item", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, 3) |
+------------------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.001 seconds.

I expect that by wrapping it with [] it becomes a List of FixedSizedList, but it does not:

> select arrow_typeof([arrow_cast([1,2,3], 'FixedSizeList(3, Int64)')]);
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| arrow_typeof(make_array(arrow_cast(make_array(Int64(1),Int64(2),Int64(3)),Utf8("FixedSizeList(3, Int64)"))))                                                                                                                |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| List(Field { name: "item", data_type: List(Field { name: "item", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }) |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.001 seconds.

Expected behavior

I expect the output of this to be

> select arrow_typeof([arrow_cast([1,2,3], 'FixedSizeList(3, Int64)')]);
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| arrow_typeof(make_array(arrow_cast(make_array(Int64(1),Int64(2),Int64(3)),Utf8("FixedSizeList(3, Int64)"))))                                                                                                                |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| List(Field { name: "item", data_type: FixedSizeList(Field { name: "item", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, 3), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }) |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.001 seconds.

Additional context

Follow on from discussion in

@alamb alamb added the bug Something isn't working label Dec 17, 2024
@jayzhan211
Copy link
Contributor

jayzhan211 commented Dec 18, 2024

If we eagerly convert a fixed-size list into a standard list, we can avoid handling fixed-size lists in array functions.

While this approach isn't entirely straightforward, it aligns with the intended result:

query T
select arrow_typeof(arrow_cast([1,2,3], 'FixedSizeList(3, Int64)'));
----
FixedSizeList(Field { name: "item", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, 3)

// make_array() coerce fixed size list to list
query T
select arrow_typeof(make_array(arrow_cast([1,2,3], 'FixedSizeList(3, Int64)')));
----
List(Field { name: "item", data_type: List(Field { name: "item", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} })

The "issue" that eagerly coerce to list is not clear to me.

@jayzhan211
Copy link
Contributor

jayzhan211 commented Dec 18, 2024

The downside is that we have additional casting that is not necessary.

@alamb
Copy link
Contributor Author

alamb commented Dec 18, 2024

If we eagerly convert a fixed-size list into a standard list, we can avoid handling fixed-size lists in array functions.

I see -- so in your mind this behavior is "working as intended" with the current design? (this is fine from my perspective, I just found it confusing and it lead to lots of discussions on #13756 , so perhaps we can add some more documentation)

I don't have time to work on this particular issue, I was just trying to get it filed

@jayzhan211
Copy link
Contributor

D select [array_value(1,2,3)];
┌───────────────────────────────────────┐
│ main.list_value(array_value(1, 2, 3)) │
│             integer[3][]              │
├───────────────────────────────────────┤
│ [[1, 2, 3]]                           │
└───────────────────────────────────────┘
D select [array_value(1,2,3)][1];
┌──────────────────────────────────────────┐
│ main.list_value(array_value(1, 2, 3))[1] │
│                integer[3]                │
├──────────────────────────────────────────┤
│ [1, 2, 3]                                │
└──────────────────────────────────────────┘
D select array_append(array_value(1,2,3), 4);
┌───────────────────────────────────────┐
│ array_append(array_value(1, 2, 3), 4) │
│                int32[]                │
├───────────────────────────────────────┤
│ [1, 2, 3, 4]                          │
└───────────────────────────────────────┘

Now, I think as long as the fixed size list is not mutated, we can keep it as fixed size list,
otherwise we DON'T cast it back to fixed size list. I guess this reduce the additional casting cost and is more consistent with DuckDB

@findepi findepi changed the title Can not create a List of FixedSizedList in SQL Cannot create a List of FixedSizedList in SQL Dec 19, 2024
@alamb
Copy link
Contributor Author

alamb commented Dec 19, 2024

Now, I think as long as the fixed size list is not mutated, we can keep it as fixed size list,

That makes sense -- if you ever try to change the size of a fixed size list, it may no longer be fixed size!

@alan910127
Copy link
Contributor

take

@alamb
Copy link
Contributor Author

alamb commented Dec 19, 2024

BTW it is not clear to me that the desired behavior for this ticket is clear (so I don't want to waste your time @alan910127 )

I may be mistaken, but it might be a good idea to clarify what you are planning to do here

(is it to make the example I proposed work?)

@alan910127
Copy link
Contributor

alan910127 commented Dec 23, 2024

@alamb Sorry for the late response. Based on your discussion, I think making your example work with the behavior aligning with DuckDB (i.e. don't cast back to FixedSizeList once mutated) might be the better approach, since it reduces implicit casts and their associated cost.

@alamb
Copy link
Contributor Author

alamb commented Dec 24, 2024

@alamb Sorry for the late response. Based on your discussion, I think making your example work with the behavior aligning with DuckDB (i.e. don't cast back to FixedSizeList once mutated) might be the better approach, since it reduces implicit casts and their associated cost.

sounds like a good planto me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants