Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refine single-value column to treat it as that single value #12120

Merged
merged 31 commits into from
Jan 27, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
ac478e5
refine single value column type
radeusgd Jan 22, 2025
85fa20e
checkpoint
radeusgd Jan 22, 2025
5a24c2c
debugging
radeusgd Jan 22, 2025
7d10624
NPE in dispatch
radeusgd Jan 22, 2025
5b5f09c
assert
radeusgd Jan 22, 2025
22a39a5
Use TypeOfNode which understands java.time.LocalDate & co.
JaroslavTulach Jan 22, 2025
6ba616b
Merge branch 'develop' into wip/radeusgd/12095-column-as-single-value
radeusgd Jan 23, 2025
04c63e6
base sig
radeusgd Jan 23, 2025
38921c2
re-enabling tests
radeusgd Jan 23, 2025
e81546e
revert sig
radeusgd Jan 23, 2025
586de76
add a few more tests
radeusgd Jan 23, 2025
06dca42
re-add return check after discussion
radeusgd Jan 23, 2025
cc8d523
adapt tests to assumption that we require explicit cast to extract th…
radeusgd Jan 23, 2025
d0044e4
debug
radeusgd Jan 23, 2025
9bb73ae
Avoid dispatch of Any methods on EnsoMultiValue - see also #11827 and…
JaroslavTulach Jan 24, 2025
528bb50
Merge branch 'develop' into wip/radeusgd/12095-column-as-single-value
radeusgd Jan 24, 2025
20c6e81
fix wrongly written test
radeusgd Jan 24, 2025
ef26c37
split tests
radeusgd Jan 24, 2025
3ca42e7
CR: move conversions, update comments
radeusgd Jan 24, 2025
167aeb1
CR: remove redundant unchecked sig
radeusgd Jan 24, 2025
1700cd0
CR: Float is not an Int even if it has 0 fractional part
radeusgd Jan 24, 2025
d98b042
add first sig (rest will be in #12138)
radeusgd Jan 24, 2025
4820070
CR: add rhs test (but still needs cast)
radeusgd Jan 24, 2025
f757300
Workaround for an apparent pattern matching bug?
radeusgd Jan 24, 2025
3fc9d1f
Merge branch 'develop' into wip/radeusgd/12095-column-as-single-value
radeusgd Jan 24, 2025
41db180
yet another fix
radeusgd Jan 24, 2025
464f064
port fix from #12139 to be able to read logs
radeusgd Jan 24, 2025
29f1451
another workaround for matching bug
radeusgd Jan 24, 2025
26ce1c1
need to use the type from the match as it may have extracted something
radeusgd Jan 24, 2025
fc57147
fix ordering of cases
radeusgd Jan 24, 2025
7c937d7
Merge branch 'develop' into wip/radeusgd/12095-column-as-single-value
radeusgd Jan 27, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -174,9 +174,9 @@ type Dictionary key value
insert self key=(Missing_Argument.throw "key") value=(Missing_Argument.throw "value") no_warning:Boolean=False =
new_dict = self.insert_builtin key value
case key of
_ : Float ->
key_as_float : Float ->
if no_warning then new_dict else
Warning.attach (Floating_Point_Equality.Used_As_Dictionary_Key key) new_dict
Warning.attach (Floating_Point_Equality.Used_As_Dictionary_Key key_as_float) new_dict
Comment on lines -177 to +179
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With intersection types, with

case a of
   b : T -> ...

The a and b are no longer interchangeable. The T part could be hidden in a (so you cannot pass it into functions that expect T) and it can be un-hidden by the b : T check at which point b has it 'visible' and b can be passed to f (t : T) whereas a cannot.

So we need to keep in mind that whenever we rely on case of, we should not do _ : T but name it and use the named component. At least anywhere where we may expect the intersection types to come up.

cc: @jdunkerley @GregoryTravis

This is actually an important change to Enso semantics that we kind of knew about in #11600 but I'm not sure we have appreciate its implications enough yet.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm just reminding.

_ -> new_dict

## GROUP Selections
Expand Down
31 changes: 17 additions & 14 deletions distribution/lib/Standard/Table/0.0.0-dev/src/Column.enso
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ from project.Errors import Conversion_Failure, Inexact_Type_Coercion, Invalid_Co
from project.Internal.Column_Format import all
from project.Internal.Java_Exports import make_string_builder
from project.Internal.Storage import enso_to_java, java_to_enso
from project.Internal.Type_Refinements.Single_Value_Column import refine_with_single_value

polyglot java import org.enso.base.Time_Utils
polyglot java import org.enso.table.data.column.operation.cast.CastProblemAggregator
Expand Down Expand Up @@ -87,7 +88,7 @@ type Column
example_from_vector =
Column.from_vector "My Column" [1, 2, 3, 4, 5]
from_vector : Text -> Vector -> Auto | Value_Type -> Column ! Invalid_Value_Type
from_vector (name : Text) (items : Vector) (value_type : Auto | Value_Type = Auto) =
from_vector (name : Text) (items : Vector) (value_type : Auto | Value_Type = Auto) -> Column =
## If the type does not accept date-time-like values, we can skip the
additional logic for polyglot conversions that would normally be used,
which is quite costly - so if we can guarantee it is unnecessary,
Expand Down Expand Up @@ -120,10 +121,12 @@ type Column
case needs_polyglot_conversion of
True -> Java_Column.fromItems name (enso_to_java_maybe items) expected_storage_type java_problem_aggregator
False -> Java_Column.fromItemsNoDateConversion name items expected_storage_type java_problem_aggregator
result = Column.from_java_column java_column . throw_on_warning Conversion_Failure
Copy link
Member

@JaroslavTulach JaroslavTulach Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling Any.throw_on_warning extracts the first value of an intersection type of a multi value that has such a method - e.g. Column and invokes the method on such a simple value while loosing the additional types. Related issues:

The workaround is to avoid calling Any instance methods. Done in 9bb73ae. The only "proper solution" I can think of: if dispatching instance method of Any, send the whole multi value as self.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only "proper solution" I can think of: if dispatching instance method of Any, send the whole multi value as self.

I think that is the behaviour we need for the intersection type solution to make any sense. We can't make it so easy to 'loose' the intersected types. While in libraries we can try to rely on workarounds, our users will get confused if the column stops being an Integer_Column after removing warnings. throw_on_warning and the related functions are user facing methods so they cannot be breaking it.

result.catch Conversion_Failure error->
if error.example_values.is_empty then result else
raise_invalid_value_type_error error.example_values.first
multi_result = Column.from_java_column java_column
result = Warning.throw_on_warning multi_result Conversion_Failure
if Meta.is_error result . not then result else
result.catch Conversion_Failure error->
if error.example_values.is_empty then result else
raise_invalid_value_type_error error.example_values.first

## PRIVATE
Creates a new column given a name and an internal Java storage.
Expand All @@ -135,9 +138,9 @@ type Column

## PRIVATE
Creates a new column given a Java Column object.
from_java_column : Java_Column -> Column
from_java_column java_column =
Column.Value java_column
from_java_column java_column:Java_Column -> Column =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a PRIVATE function. It doesn't really need signature. E.g. the -> Column can be dropped without any impact. Some publicly visible function needs a signature in order for the static analysis to use it. What's that function?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then the question is: what should be the signature of such a function? @radeusgd has proposed Column & Any. What should it mean if a multi value gets into such a cast?

  • if one of the intersection types of such a multi value is Column, then move it first and unhide all other hidden types
  • if no type of Column is among the intersection types of such a multi value, but there is a conversion from one of its types, then just perform conversion to Column and return the result

Is that how you want Column & Any to behave, @radeusgd?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a PRIVATE function. It doesn't really need signature. E.g. the -> Column can be dropped without any impact. Some publicly visible function needs a signature in order for the static analysis to use it. What's that function?

The idea was that adding the return type check here, affects semantics of all functions that rely on it. So even if I forget to update the signature in some case, all methods that return a Column will have consistent semantics regarding 'hiding' of the intersection type. Thus my plan was to first only have this method updated (to get the desired semantics and experiment with it) and do a refactor of all type signatures in a separate PR - that will be a pretty big and boring change, so I thought it will be easier to review if separated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • if one of the intersection types of such a multi value is Column, then move it first and unhide all other hidden types

I was initially thinking to avoid hiding any additional types that were intersected with Column
. I was not thinking about additionally 'unhiding' any other types. I think it may be acceptable to unhide all types if X & Any is encountered, although I feel that the instanceof metaphor works better if hidden types are not unhidden, just adding & Any guarantees we don't hide any currently visible ones.

I'm happy for either.

But overall yes, that was my initial suggestion for how I think Column & Any should work. I'd be very happy if we can get it working that way :)

Even if we don't end up using Column & Any here and only have Column - requiring the casts to be inserted, I still think that the semantics of X & Any (if it's ever encountered in any code) should indeed be as you suggest above.

column = Column.Value java_column
column |> refine_with_single_value

## PRIVATE
ADVANCED
Expand Down Expand Up @@ -1202,8 +1205,8 @@ type Column
storage_type = Storage.from_value_type_strict common_type
new_storage = Java_Problems.with_problem_aggregator Problem_Behavior.Report_Warning java_problem_aggregator->
case default of
Column.Value java_col ->
other_storage = java_col.getStorage
col : Column ->
other_storage = col.java_column.getStorage
storage.fillMissingFrom other_storage storage_type java_problem_aggregator
_ ->
storage.fillMissing default storage_type java_problem_aggregator
Expand Down Expand Up @@ -2699,9 +2702,9 @@ run_vectorized_binary_op column name operand new_name=Nothing fallback_fn=Nothin
Java_Problems.with_map_operation_problem_aggregator column.name Problem_Behavior.Report_Warning problem_builder->
storage_type = resolve_storage_type expected_result_type
case operand of
Column.Value col2 ->
operand_column : Column ->
s1 = column.java_column.getStorage
s2 = col2.getStorage
s2 = operand_column.java_column.getStorage
rs = Polyglot_Helpers.handle_polyglot_dataflow_errors <|
s1.vectorizedOrFallbackZip name problem_builder fallback_fn s2 skip_nulls storage_type
Column.from_storage effective_new_name rs
Expand Down Expand Up @@ -2792,9 +2795,9 @@ run_vectorized_binary_op_with_fallback_problem_handling column name operand fall
_ -> fallback_fn problem_builder
storage_type = resolve_storage_type expected_result_type
case operand of
Column.Value col2 ->
operand_column : Column ->
s1 = column.java_column.getStorage
s2 = col2.getStorage
s2 = operand_column.java_column.getStorage
rs = Polyglot_Helpers.handle_polyglot_dataflow_errors <|
s1.vectorizedOrFallbackZip name problem_builder applied_fn s2 skip_nulls storage_type
Column.from_storage new_name rs
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
private

from Standard.Base import all

import project.Column.Column
import project.Value_Type.Value_Type
from project.Internal.Type_Refinements.Single_Value_Column_Extensions import all

refine_with_single_value (column : Column) =
## We treat a column as single value if it contains a single not-nothing value.
if is_single_value column . not then column else case column.inferred_precise_value_type of
Value_Type.Integer _ ->
# `inferred_precise_value_type` will return Integer if the column was Float (or Mixed) but contained integral values - e.g. [2.0]
# We inspect the actual value to correctly deal with both Float and Mixed base type.
value = column.at 0
case value of
# If the value was really a float, we preserve that.
_ : Float -> (column : Column & Float)
# Otherwise we treat it as an integer.
_ -> (column : Column & Integer)
Value_Type.Float _ -> (column : Column & Float)
Value_Type.Char _ _ -> (column : Column & Text)
Value_Type.Boolean -> (column : Column & Boolean)
Value_Type.Date -> (column : Column & Date)
Value_Type.Time -> (column : Column & Time_Of_Day)
Value_Type.Date_Time True -> (column : Column & Date_Time)
Value_Type.Decimal _ scale ->
is_integer = scale == 0
if is_integer then (column : Column & Integer) else (column : Column & Decimal)
# Other types (e.g. Mixed) are not supported.
_ -> column

is_single_value column:Column -> Boolean =
(column.length == 1) && (column.at 0 . is_nothing . not)
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
private

from Standard.Base import all

import project.Column.Column
from project.Internal.Type_Refinements.Single_Value_Column import is_single_value

## This conversion is internal and should never be exported.
Integer.from (that : Column) -> Integer =
Runtime.assert (is_single_value that)
x = that.at 0
case x of
_ : Integer -> x
_ : Float ->
Runtime.assert (x % 1.0 == 0.0)
x.truncate

## This conversion is internal and should never be exported.
Float.from (that : Column) -> Float =
Runtime.assert (is_single_value that)
that.at 0

## This conversion is internal and should never be exported.
Text.from (that : Column) -> Text =
Runtime.assert (is_single_value that)
that.at 0

## This conversion is internal and should never be exported.
Boolean.from (that : Column) -> Boolean =
Runtime.assert (is_single_value that)
that.at 0

## This conversion is internal and should never be exported.
Date.from (that : Column) -> Date =
Runtime.assert (is_single_value that)
that.at 0

## This conversion is internal and should never be exported.
Time_Of_Day.from (that : Column) -> Time_Of_Day =
Runtime.assert (is_single_value that)
that.at 0

## This conversion is internal and should never be exported.
Date_Time.from (that : Column) -> Date_Time =
Runtime.assert (is_single_value that)
that.at 0

## This conversion is internal and should never be exported.
Decimal.from (that : Column) -> Decimal =
Runtime.assert (is_single_value that)
that.at 0
6 changes: 3 additions & 3 deletions distribution/lib/Standard/Table/0.0.0-dev/src/Table.enso
Original file line number Diff line number Diff line change
Expand Up @@ -144,13 +144,13 @@ type Table
new : Vector (Vector | Column) -> Table
new columns =
invalid_input_shape =
Error.throw (Illegal_Argument.Error "Each column must be represented by a pair whose first element is the column name and the second element is a vector of elements that will constitute that column, or an existing column.")
Error.throw (Illegal_Argument.Error "Each column must be represented by a pair whose first element is the column name and the second element is a vector of elements that will constitute that column, or an existing column. Got: "+columns.to_text)
cols = columns.map on_problems=No_Wrap.Value c->
case c of
v : Vector ->
if v.length != 2 then invalid_input_shape else
Column.from_vector (v.at 0) (v.at 1) . java_column
Column.Value java_col -> java_col
col : Column -> col.java_column
_ -> invalid_input_shape
Panic.recover Illegal_Argument <|
if cols.is_empty then
Expand Down Expand Up @@ -2472,9 +2472,9 @@ type Table
unique.mark_used self.column_names

resolved = case value of
_ : Column -> value
_ : Text -> self.make_constant_column value
_ : Expression -> self.evaluate_expression value on_problems
_ : Column -> value
Comment on lines 2474 to -2477
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, intersection types have changed something fundamental about Enso semantics.

case of branches that used to be completely disjoint - a value was able to match only one of them - are now no longer disjoint - a multi-value can match multiple branches. So now we need to be more careful about ordering of the branches. If a value can match more than one, which branch is the one that should be preferred?

E.g. here single-text-value column would match both _ : Text and _ : Column branch (because it is Column & Text). Now, we want it to still go to the _ : Column branch, if it went to _ : Text branch that was leading to errors.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_ : Constant_Column -> self.make_constant_column value.value
_ : Simple_Expression -> value.evaluate self (set_mode==Set_Mode.Update && as=="") on_problems
_ -> Error.throw (Illegal_Argument.Error "Unsupported type for `Table.set`.")
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,18 +9,18 @@
import org.enso.interpreter.runtime.EnsoContext;
import org.enso.interpreter.runtime.data.EnsoMultiValue;
import org.enso.interpreter.runtime.data.Type;
import org.enso.interpreter.runtime.library.dispatch.TypesLibrary;
import org.enso.interpreter.runtime.library.dispatch.TypeOfNode;

final class AllOfTypesCheckNode extends AbstractTypeCheckNode {

@Children private AbstractTypeCheckNode[] checks;
@Child private TypesLibrary types;
@Child private TypeOfNode typeNode;
@Child private EnsoMultiValue.NewNode newNode;

AllOfTypesCheckNode(String name, AbstractTypeCheckNode[] checks) {
super(name);
this.checks = checks;
this.types = TypesLibrary.getFactory().createDispatched(checks.length);
this.typeNode = TypeOfNode.create();
this.newNode = EnsoMultiValue.NewNode.create();
}

Expand All @@ -46,7 +46,8 @@ Object executeCheckOrConversion(VirtualFrame frame, Object value, ExpressionNode
if (result == null) {
return null;
}
var t = types.getType(result);
var t = typeNode.findTypeOrNull(result);
assert t != null : "Value " + result + " doesn't have type!";
var ctx = EnsoContext.get(this);
if (ctx.getBuiltins().number().getInteger() == t) {
if (++integers > 1) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,9 +26,19 @@ final class EnsoMultiType {
private final Type[] types;

private EnsoMultiType(Type[] types) {
assert checkNonNull(types);
this.types = types;
}

private static boolean checkNonNull(Type[] types) {
for (var t : types) {
if (t == null) {
return false;
}
}
return true;
}

@CompilerDirectives.TruffleBoundary
static EnsoMultiType findOrCreateSlow(Type[] types, int from, int to) {
var mt = new EnsoMultiType(Arrays.copyOfRange(types, from, to));
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -506,7 +506,9 @@ Object invokeMember(
@Override
public String toString() {
var both = EnsoMultiType.AllTypesWith.getUncached().executeAllTypes(dispatch, extra, 0);
return Stream.of(both).map(t -> t.getName()).collect(Collectors.joining(" & "));
return Stream.of(both)
.map(t -> t != null ? t.getName() : "[?]")
.collect(Collectors.joining(" & "));
}

/** Casts {@link EnsoMultiValue} to requested type effectively. */
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -201,11 +201,6 @@ private String getKeyForFile(File file) throws IOException {
}

void release(ReadOnlyExcelConnection excelConnection) throws IOException {
System.out.println("AAAo");
System.err.println("AAAo");
new Exception().printStackTrace();
System.out.println("AAAo2");
System.err.println("AAAo2");
synchronized (this) {
excelConnection.record.refCount--;
if (excelConnection.record.refCount <= 0) {
Expand Down
2 changes: 2 additions & 0 deletions test/Table_Tests/src/In_Memory/Main.enso
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ import project.In_Memory.Fan_Out_Spec
import project.In_Memory.Integer_Overflow_Spec
import project.In_Memory.Lossy_Conversions_Spec
import project.In_Memory.Parse_To_Table_Spec
import project.In_Memory.Single_Value_Column_Spec
import project.In_Memory.Split_Tokenize_Spec
import project.In_Memory.Table_Spec
import project.In_Memory.Table_Xml_Spec
Expand All @@ -33,6 +34,7 @@ add_specs suite_builder =
Integer_Overflow_Spec.add_specs suite_builder
Lossy_Conversions_Spec.add_specs suite_builder
Parse_To_Table_Spec.add_specs suite_builder
Single_Value_Column_Spec.add_specs suite_builder
Split_Tokenize_Spec.add_specs suite_builder
Table_Conversion_Spec.add_specs suite_builder
Table_Date_Spec.add_specs suite_builder
Expand Down
Loading
Loading