Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[osh] Return the placeholder type for ${a@a} and ${a[0]@A} #2208

Merged
merged 6 commits into from
Dec 31, 2024

Conversation

akinomyoga
Copy link
Collaborator

This corresponds to "PR(c)" explained in #2201. I'll submit the changes for PR(b) after settling this change.

I noticed that the behavior for ${a[0]@a} is different from Bash:

  • In Bash, a and A are used to mark the nature of a string placeholder (i.e., either a scalar variable or an array element), so ${a[0]@a} generates a and A for indexed and associative arrays, respectively.
  • However, in osh, a and A are used to mark the type of an obtained value (e.g., an empty string for a scalar variables, and a or A for word lists originating from indexed and associative arrays), so ${a[0]@a} generates a empty string because a[0] is not an entire array.

I think the existing code comments in the osh codebase imply that the difference is intentional. Nevertheless, I think the function of @a should be the same as Bash. Maybe one wants to have a way to get the type of the obtained value, but I think that should be implemented as a distinct operator.

@andychu andychu changed the base branch from master to soil-staging December 31, 2024 14:44
@andychu
Copy link
Contributor

andychu commented Dec 31, 2024

OK I played with the tests, and this change is good, since it makes OSH more compatible

However I am confused by the word "string placeholder". Is that word used in bash? I have not heard of it

I think we should mention this in the op-indirect topic in chap-word-lang. But I am not sure how to best describe the behavior

@andychu
Copy link
Contributor

andychu commented Dec 31, 2024

I guess I am sorta confused by these definitions, even though I can see this change makes OSH more compatible

  • " a string placeholder (i.e., either a scalar variable or an array element),"
  • "an obtained value (e.g., an empty string for a scalar variables, and a or A for word lists originating from indexed and associative arrays"

I guess the second definition is OK -- necessarily I guess it is repeating the implementation of bash !

the "string placeholder" I think could use more clarification

@akinomyoga
Copy link
Collaborator Author

akinomyoga commented Dec 31, 2024

OK I played with the tests, and this change is good, since it makes OSH more compatible

However I am confused by the word "string placeholder". Is that word used in bash? I have not heard of it

No, I'm sorry that I confused you. I invented that word just to explain the idea. I chose that word because I didn't know how to describe it in other words. Anyway, the word "string placeholder" is only used in the above description and unused in the actual changes of this PR.

I think we should mention this in the op-indirect topic in chap-word-lang. But I am not sure how to best describe the behavior

Do you have a better suggestion for describing "string placeholder"?

I wanted to say "a scalar variable" or "an array element" that holds a string. If we use "variable placeholder" or "variable cell", it means the scalar variable or an entire array, so they do not seem to suit the current situation well to me.

edit: I might try to describe it without using any specific words. Let me think.

@andychu
Copy link
Contributor

andychu commented Dec 31, 2024

Ah OK, maybe we can just define some arbitrary terms, like

  • f-value (f comes from ref) - define this to be what was the string placeholder -- it is some kind of value_t that is derived via some rules from the original value, right ?
  • o-value (obtained value) - the second definition

And then the variable can be vsub_state.f_value . The "holder" name seems to suggest I should know what it holds, but I don't know

We could also call it "ff_value" to be more distinct or something, or "z-value" if you want to be more arbitrary


IMO there is no real logic to this other than bash's implementation ... it is not necessarily intuitive

But we have to describe it, so it is useful to make up some terms

@andychu
Copy link
Contributor

andychu commented Dec 31, 2024

The oils reference could also say something like:

"The behavior of ${!ref} when combined with operations like ${x@a} is designed to be compatible with bash. These definitions are useful to trace the evaluation:

  • f-value ... whatever it is
  • o-value ... whatever it is
    "

We don't have to explain the whole algorithm, but giving hints can be useful

I find it useful to document things for myself to remember later! And doing it outside the code encourages a fresh look at it

@akinomyoga
Copy link
Collaborator Author

akinomyoga commented Dec 31, 2024

I added a version of doc, which doesn't use specific words, in commit 13491ca. I'll think about f-value, o-value, or something. To me, they seem similar to the lvalue (reference) and rvalue (value) of C++, but "f-value" actually contains even more information about where the cell is included.

@andychu
Copy link
Contributor

andychu commented Dec 31, 2024

OK the docs look pretty good

I would say the main thing is to come up with a name for vsub_state.holder_val -- or comment on what it is "holding", if you think that is a good name

If it doesn't have a clear meaning, then I think a more arbitrary term like "f-value" or "z-value" is better


Although the docs do mention "obtained value" -- I think that is not quite clear -- what it is obtained from?

The way I think of it, there are many rules for evaluation, and you get a sequence of values along the way

e.g. if you evaluate 1+2 * 3, you get a result for 2*3, and then 1+6

Similarly with ! and @a. It has many rules and potentially many values - original one, after !, after [], after @a, etc.

@akinomyoga
Copy link
Collaborator Author

bc908c3 I decided to call it h-value (holder) and r-value (result) since "holder" fits my feeling about the concept well. I chose r-value because the concept is similar to rvalue in C++, though our r-value stands for resulting value while C++ rvalue comes from the right-hand side.

@andychu
Copy link
Contributor

andychu commented Dec 31, 2024

Hm I tried to rewrite the doc as a table, because I made this thing ul-table, which is a convenient way of doing tables in Markdown:

https://oils.pub/release/0.25.0/doc/ul-table.html

But I ran into this difference when testing it:

 ~/git/oils-for-unix/oils (array-elem-op0)$ bin/osh -c 'arr=(1 2 3); ref=arr[@]; echo ${!ref@a}'
a

 ~/git/oils-for-unix/oils (array-elem-op0)$ bash -c 'arr=(1 2 3); ref=arr[@]; echo ${!ref@a}'
a a a

Should we make it consistent?

Also please look at f84a65d and check if it matches your intention

I think we might only need "h value" -- I am not sure we need "r value". The "r value" ${X} could just be the "value" of ${X}, compared to ${X@A}


You can see the rendered doc in:

https://op.oilshell.org/uuu/github-jobs/8820/

ovm-tarball -> Docs -> Oils reference -> doc/ref/chap-word-lang.html

@andychu
Copy link
Contributor

andychu commented Dec 31, 2024

Also it occurs to me that maybe "h-value" is just "the container that held the value" ?

i.e. we are giving the flag a for an arr[0] because the container arr held that string

@andychu
Copy link
Contributor

andychu commented Dec 31, 2024

Also this is the quickest way to preview the doc

build/doc.sh split-and-render doc/ref/chap-word-lang.md '' ../../web

It's a little annoying, you have to remember to pass ../../web -- othewrise the CSS will be broken

We could improve that


RENDERED

https://op.oilshell.org/uuu/github-jobs/8820/ovm-tarball.wwz/_release/VERSION/doc/ref/chap-word-lang.html#op-format

@andychu
Copy link
Contributor

andychu commented Dec 31, 2024

I guess my table is also leaving out this difference:

$ bash -c 'arr=(1 2 3); ref=arr[@]; echo ${!ref@a}'     # [@] is in the ref
a a a

$ bash -c 'arr=(1 2 3); ref=arr; echo ${!ref[@]@a}'   # [@] is in the BracedVarSub
a

I am not sure if the "h-value" is precisely defined enough to capture that


Sometimes the only way to describe bash is to enumerate what it does in EVERY case! But bash itself may not do that ...

@andychu
Copy link
Contributor

andychu commented Dec 31, 2024

Also I am OK if we don't fix this until later -- we could add a FAILING spec test, that is OK. I often do that to "make a note for later"

I don't want to get too distracted from SparseArray, if this is not necessary for that work

I guess just check if the table form of the doc seems acceptable

@akinomyoga
Copy link
Collaborator Author

But I ran into this difference when testing it:

 ~/git/oils-for-unix/oils (array-elem-op0)$ bin/osh -c 'arr=(1 2 3); ref=arr[@]; echo ${!ref@a}'
a

 ~/git/oils-for-unix/oils (array-elem-op0)$ bash -c 'arr=(1 2 3); ref=arr[@]; echo ${!ref@a}'
a a a

Should we make it consistent?

This is exactly the one that the next PR(b) fixes.

Also please look at f84a65d and check if it matches your intention

I've checked it. I think it matches my intention.

doc/ref/chap-word-lang.md Outdated Show resolved Hide resolved
@akinomyoga
Copy link
Collaborator Author

Also it occurs to me that maybe "h-value" is just "the container that held the value" ?

i.e. we are giving the flag a for an arr[0] because the container arr held that string

Yes, if that explanation works fine for you. I tried to clarify which container it represents because you pointed out that there are multiple processes in the evaluation of ${x...}, where several intermediate containers or arrays are generated.

@akinomyoga
Copy link
Collaborator Author

I guess my table is also leaving out this difference:

$ bash -c 'arr=(1 2 3); ref=arr[@]; echo ${!ref@a}'     # [@] is in the ref
a a a

$ bash -c 'arr=(1 2 3); ref=arr; echo ${!ref[@]@a}'   # [@] is in the BracedVarSub
a

I am not sure if the "h-value" is precisely defined enough to capture that

I believe yes, because ${!ref[@]} is evaluated in this way: 1) first ${ref[@]} produces (arr), which is then joined to be arr. 2) ${arr} is evaluated, which is a synonym of ${arr[0]}. So this should return the same result as ${arr[0]@a}, i.e., a.

@akinomyoga
Copy link
Collaborator Author

Also I am OK if we don't fix this until later -- we could add a FAILING spec test, that is OK. I often do that to "make a note for later"

I don't want to get too distracted from SparseArray, if this is not necessary for that work

As I've written, these are going to be fixed in PR(b) (the next PR). I now rebased it on top of this PR and confirmed that the OSH behavior becomes consistent with Bash after PR(b):

$ bin/osh -c 'arr=(1 2 3); ref=arr[@]; echo ${!ref@a}'
a a a
$ bin/osh -c 'arr=(1 2 3); ref=arr; echo ${!ref[@]@a}'
a
$

@andychu andychu merged commit f2c84da into soil-staging Dec 31, 2024
18 checks passed
@andychu
Copy link
Contributor

andychu commented Dec 31, 2024

OK great, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants