Post by Kaz KylhekuPost by Keith ThompsonPost by Aharon RobbinsPost by Ed Mortonthe effect of setting `NF` is
undefined behavior per POSIX and so will do different things in
different awk variants and even in 1 awk variant can behave differently
depending on whether you're setting it to a higher or lower than
original value
This is not true. The effect of setting NF was well defined
by the original awk book and also in POSIX.
Decreasing NF throws away fields. Increasing NF adds the
intervening fields with the null string as their values
and rebuilds the record.
I don't see that in the POSIX specification.
References to nonexistent fields (that is, fields after $NF), shall
evaluate to the uninitialized value.
NF is assignable, and fields after $NF do not exist. Thus if we
have four fields and set NF = 3, then $4 doesn't exist.
That describes what happens if NF is modified by assignment, but I don't
see that it implies that such an assignment is allowed.
Post by Kaz KylhekuThat implies it must cease to exist; i.e. be destroyed. If setting NF = 4 were
to restore $4 then that would mean it had continued to exist, but was only
hidden.
The behavior is present in GNU Awk, Mawk, BusyBox Awk and others.
I accept that most, quite possible all, implementations of Awk allow
assignment to NF, with the semantics of dropping fields after $NF or
adding new fields if the value decreases or increases, respectively.
And on the basis of that, I accept that POSIX *should* specify the
behavior of assigning to NF -- especially if the original AWK book
defines it. The second edition briefly mentions modifying NF:
"Conversely, if NF changes, $0 is recomputed when its value is needed."
But I can imagine a hypothetical awk-like language in which assigning to
NF has undefined behavior. My question is, how does the POSIX
specification not describe that language?
Looking more closely at
https://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html
it can be argued that assigning to NF *is* well defined, but it could be
much clearer. The syntax for a simple assignment is:
lvalue '=' expr
where an lvalue is one of:
NAME
NAME '[' expr_list ']'
'$' expr
and:
The token NAME shall consist of a word that is not a keyword or a
name of a built-in function and is not followed immediately (without
any delimiters) by the '(' character.
Which implies that, for example, `NF = 10` is valid.
Also, NF is a "special variable", which weakly implies that it's
assignable.
On the other hand, it also implies that `foo = 42` is valid where `foo`
is the name of a user-defined function (gawk disallows it). It should
say that the name of a user-defined function is not an lvalue.
The POSIX description reads to me as if the authors just didn't think
about whether assigning to NR, or to user-defined function names, should
be permitted. The behavior of adding or removing fields when NR is
modified by assignment is, I suggest, something that should be stated
explicitly.
[...]
Post by Kaz KylhekuPost by Keith Thompsonhttps://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html
"""
NF
The number of fields in the current record. Inside a BEGIN action,
the use of NF is undefined unless a getline function without a var
argument is executed previously. Inside an END action, NF shall
retain the value it had for the last record read, unless a
subsequent, redirected, getline function without a var argument is
performed prior to entering the END action.
This looks defective. The value of NF observed in END must obviously
be the last stored one, however it was stored, whether by assignment
or getline.
Note that NF is also recalculated if $0 is assigned, which is
explicitly required in the document; it is glaringly defective to
be appearing to be making an exception for getline but not for
assignment to $0.
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
Working, but not speaking, for Medtronic
void Void(void) { Void(); } /* The recursive call of the void */