Discussion:
Awk output redirection to expression - defined or not?
(too old to reply)
Ed Morton
2023-05-25 14:30:53 UTC
Permalink
I'm certain I remember years ago reading a document that said
(paraphrasing) "an unparenthesized expression on the right side of input
or output redirection is undefined behavior" and I thought it was an
older version of the POSIX spec. I now can't find that (or similar)
statement in any of these:

SUSV2 - https://pubs.opengroup.org/onlinepubs/7990989775/xcu/awk.html
SUSV3 -
https://pubs.opengroup.org/onlinepubs/009695399/utilities/awk.html
Current POSIX spec -
https://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html

or by googling.

What I do see in the current POSIX spec is a related statement just
getline < "a" "b"
( getline < "a" ) "b"
although many would argue that the intent was that the file ab should
getline < "x" + 1
getline < ( "x" + 1 )
...
Since in most cases such constructs are not (or at least should not)
be used (because they have a natural ambiguity for which there is no
conventional parsing), the meaning of these constructs has been made
explicitly unspecified.
The getline operator can form ambiguous constructs when there are
unparenthesized binary operators (including concatenate) to the right of
the '<' (up to the end of the expression containing the getline). The
result of evaluating such a construct is unspecified

but nothing about output redirection. I know gawk doesn't require parens
around the expression for output redirection but other awks do (e.g. see
https://stackoverflow.com/q/21093626/1745001) and it's not obvious to me
why `getline < "a" "b"` should be undefined behavior while `print > "a"
"b"` wouldn't be so intuitively if one of them is undefined then so
should the other be.

Does anyone else recall seeing a statement about output redirection to
an expression requiring parens and, if so, do you recall where it existed?

Ed.
Janis Papanagnou
2023-05-25 15:37:30 UTC
Permalink
Post by Ed Morton
I'm certain I remember years ago reading a document that said
(paraphrasing) "an unparenthesized expression on the right side of input
or output redirection is undefined behavior" and I thought it was an
older version of the POSIX spec. I now can't find that (or similar)
    SUSV2 - https://pubs.opengroup.org/onlinepubs/7990989775/xcu/awk.html
    SUSV3 -
https://pubs.opengroup.org/onlinepubs/009695399/utilities/awk.html
    Current POSIX spec -
https://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html
or by googling.
What I do see in the current POSIX spec is a related statement just
getline < "a" "b"
( getline < "a" ) "b"
although many would argue that the intent was that the file ab should
getline < "x" + 1
getline < ( "x" + 1 )
...
Since in most cases such constructs are not (or at least should not)
be used (because they have a natural ambiguity for which there is no
conventional parsing), the meaning of these constructs has been made
explicitly unspecified.
The getline operator can form ambiguous constructs when there are
unparenthesized binary operators (including concatenate) to the right of
the '<' (up to the end of the expression containing the getline). The
result of evaluating such a construct is unspecified
but nothing about output redirection. I know gawk doesn't require parens
around the expression for output redirection but other awks do (e.g. see
https://stackoverflow.com/q/21093626/1745001) and it's not obvious to me
why `getline < "a" "b"` should be undefined behavior while `print > "a"
"b"` wouldn't be so intuitively if one of them is undefined then so
should the other be.
Does anyone else recall seeing a statement about output redirection to
an expression requiring parens and, if so, do you recall where it existed?
What I recall is that a few times there were discussions about that,
but there was (AFAIR) never a formal explanation.

My thoughts about your question above are as follows...

getline expressions might consider precedence rules, and since in
C-like languages (as opposed to e.g. Algol68) have the precedence
associated with the concrete symbol ('<', '>') as opposed to the
semantic context, so 'less than' would bind stronger than 'concat'.
In cases where (as quoted above) "conventional parsing" deviates
from that (whatever "conventional" or "non-conventional" will be)
it might be different.

Note also that I wrote "getline *expressions*" as opposed to, say,
"print *statement*"; getline is part of the expression (it has a
value) where print has an expression argument. There is (I think)
no expression that starts with '>' in awk, so 'print >' should be
a redirection indication, generally.

Depending on semantical context an expression
if (getline < "a" + i) ...
can make sense in both cases, try reading from "a" and adding a
constant to the return value, or reading from "a1", "a42", etc.

So I can see why one is undefined but not the other. And my coding
approach would be to make the intention visible by parenthesis.

Janis
Post by Ed Morton
    Ed.
Ed Morton
2023-05-28 13:52:03 UTC
Permalink
Post by Janis Papanagnou
Post by Ed Morton
I'm certain I remember years ago reading a document that said
(paraphrasing) "an unparenthesized expression on the right side of
input or output redirection is undefined behavior" and I thought it
was an older version of the POSIX spec. I now can't find that (or
     SUSV2 -
https://pubs.opengroup.org/onlinepubs/7990989775/xcu/awk.html
     SUSV3 -
https://pubs.opengroup.org/onlinepubs/009695399/utilities/awk.html
     Current POSIX spec -
https://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html
or by googling.
What I do see in the current POSIX spec is a related statement just
 > getline < "a" "b"
 >
 > ( getline < "a" ) "b"
 >
 > although many would argue that the intent was that the file ab
 > getline < "x" + 1
 >
 > getline < ( "x" + 1 )
 >
 > ...
 > Since in most cases such constructs are not (or at least should
not) be used (because they have a natural ambiguity for which there is
no conventional parsing), the meaning of these constructs has been
made explicitly unspecified.
 > The getline operator can form ambiguous constructs when there are
unparenthesized binary operators (including concatenate) to the right
of the '<' (up to the end of the expression containing the getline).
The result of evaluating such a construct is unspecified
but nothing about output redirection. I know gawk doesn't require
parens around the expression for output redirection but other awks do
(e.g. see https://stackoverflow.com/q/21093626/1745001) and it's not
obvious to me why `getline < "a" "b"` should be undefined behavior
while `print > "a" "b"` wouldn't be so intuitively if one of them is
undefined then so should the other be.
Does anyone else recall seeing a statement about output redirection to
an expression requiring parens and, if so, do you recall where it existed?
What I recall is that a few times there were discussions about that,
but there was (AFAIR) never a formal explanation.
My thoughts about your question above are as follows...
getline expressions might consider precedence rules, and since in
C-like languages (as opposed to e.g. Algol68) have the precedence
associated with the concrete symbol ('<', '>') as opposed to the
semantic context, so 'less than' would bind stronger than 'concat'.
In cases where (as quoted above) "conventional parsing" deviates
from that (whatever "conventional" or "non-conventional" will be)
it might be different.
Note also that I wrote "getline *expressions*" as opposed to, say,
"print *statement*"; getline is part of the expression (it has a
value) where print has an expression argument. There is (I think)
no expression that starts with '>' in awk, so 'print >' should be
a redirection indication, generally.
Depending on semantical context an expression
  if (getline < "a" + i) ...
can make sense in both cases, try reading from "a" and adding a
constant to the return value, or reading from "a1", "a42", etc.
So I can see why one is undefined but not the other. And my coding
approach would be to make the intention visible by parenthesis.
Janis
Post by Ed Morton
     Ed.
Good point about `if (getline < "foo")` being valid while `if (print >
"foo")` is not, thanks.

In different parts of the POSIX spec they refer to `getline` as a
"function" and an "operator" and a "keyword" (while "print" is referred
to as a "statement" and a "keyword") so it's a little hard to say
exactly what `getline` is but they do also say at one point "the
expression containing getline" so that does match your thought above
about getline being part of an expression.

Ed.
Janis Papanagnou
2023-05-28 16:12:33 UTC
Permalink
Post by Ed Morton
[...]
Good point about `if (getline < "foo")` being valid while `if (print >
"foo")` is not, thanks.
In different parts of the POSIX spec they refer to `getline` as a
"function" and an "operator" and a "keyword" (while "print" is referred
to as a "statement" and a "keyword") so it's a little hard to say
exactly what `getline` is but they do also say at one point "the
expression containing getline" so that does match your thought above
about getline being part of an expression.
I haven't inspected the POSIX specs for that, but the points you
quote here are (quite) consistent and coherent.
Of course both (print/getline) can be [implemented as] "keywords"
whether they are "statements" or "functions".
The qualification [of getline] between "function" and "operator"
I consider a bit unprecise; usually I think of these as syntactic
differing forms

minus(x) vs. -x
minus(x,y) vs. x-y or even x minus y

A function and an operator can of course both be part of an
expression.

Janis
Post by Ed Morton
Ed.
Loading...