Discussion:
Breaking a table of record rows into an array
(too old to reply)
Mr. Man-wai Chang
2024-03-01 13:33:55 UTC
Permalink
I am new to Awk programmin.

Given a text table with the following sample entry:

[ 8] SSID[ [HOME]] BSSID[04:9F:xx:xx:xx:xx] channel[ 6]
frequency[2437] numsta[1] rssi[-63] noise[-75] beacon[98] cap[1411]
dtim[0] rate[450] enc[Group-AES-CCMP CCMP PSK2 ]

How do you use Awk to quickly & easily break it into:

bssid="04:9F:xx:xx:xx:xx";
ssid[bssid]="[HOME]";
channel[bssid]="6";
frequency[bssid]="2437";
....
rate[bssid]="450;
enc[bssid]="Group-AES-CCMP CCMP PSK2";
Janis Papanagnou
2024-03-01 14:52:42 UTC
Permalink
Post by Mr. Man-wai Chang
I am new to Awk programmin.
[ 8] SSID[ [HOME]] BSSID[04:9F:xx:xx:xx:xx] channel[ 6]
frequency[2437] numsta[1] rssi[-63] noise[-75] beacon[98] cap[1411]
dtim[0] rate[450] enc[Group-AES-CCMP CCMP PSK2 ]
Is that all on one line? (If it's on multiple lines you should
provide more context information, how more than one records are
separated from each other.)
The nasty thing is the nested '[...]'.

One quick way is to choose an appropriate field separator. For
example

BEGIN { FS="] " }
{ for (i=1; i<=NF; i++)
print $i
}

will produce on one data line like the above (it also works if
the data is spread across three lines, but you still need to
know the record separators then)...

[ 8
SSID[ [HOME]
BSSID[04:9F:xx:xx:xx:xx
channel[ 6]
frequency[2437
numsta[1
rssi[-63
noise[-75
beacon[98
cap[1411]
dtim[0
rate[450
enc[Group-AES-CCMP CCMP PSK2

If the basic splitting is okay you can do the formatting;
using sub() or gsub() on $i to remove/replace parts of the
text (e.g. to remove undesired spaces), use string
concatenation (e.g. to add the "]" again which had been
removed with the field splitting), etc., whatever needed.

Janis
Post by Mr. Man-wai Chang
bssid="04:9F:xx:xx:xx:xx";
ssid[bssid]="[HOME]";
channel[bssid]="6";
frequency[bssid]="2437";
....
rate[bssid]="450;
enc[bssid]="Group-AES-CCMP CCMP PSK2";
Mr. Man-wai Chang
2024-03-01 16:26:12 UTC
Permalink
Post by Janis Papanagnou
The nasty thing is the nested '[...]'.
One quick way is to choose an appropriate field separator. For
example
Even more nasty is that wifi SSID can use any kind of printable
characters, INCLUDING Unicode! :)

Some hardware manufactures like Cisco do restrict the printable
characters you can use in setting the SSID.
Mr. Man-wai Chang
2024-03-11 17:41:32 UTC
Permalink
Post by Janis Papanagnou
BEGIN { FS="] " }
{ for (i=1; i<=NF; i++)
print $i
}
Use of `NF` in awk command - Stack Overflow
https://stackoverflow.com/questions/47216786/use-of-nf-in-awk-command
Keith Thompson
2024-03-11 18:46:41 UTC
Permalink
Post by Mr. Man-wai Chang
Post by Janis Papanagnou
BEGIN { FS="] " }
{ for (i=1; i<=NF; i++)
print $i
}
Use of `NF` in awk command - Stack Overflow
https://stackoverflow.com/questions/47216786/use-of-nf-in-awk-command
That's a question about code that overwrites the value of NF.
How is it relevant?
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
Working, but not speaking, for Medtronic
void Void(void) { Void(); } /* The recursive call of the void */
Janis Papanagnou
2024-03-11 23:08:17 UTC
Permalink
Post by Mr. Man-wai Chang
Post by Janis Papanagnou
BEGIN { FS="] " }
{ for (i=1; i<=NF; i++)
print $i
}
Use of `NF` in awk command - Stack Overflow
So what?

You want a more cryptic way? - Here it is...

BEGIN { FS="] " ; OFS="\n" }
{ NF=NF } 1

or

BEGIN { FS="] " ; OFS="\n" }
{ $1=$1 } 1


Mind, though, that for a program skeleton to solve your task
my original code is easier to adjust for your data processing.
You are aware that it's just the first step and needs further
processing, aren't you?

Janis
Ed Morton
2024-03-12 22:21:09 UTC
Permalink
Post by Mr. Man-wai Chang
   BEGIN { FS="] " }
   { for (i=1; i<=NF; i++)
       print $i
   }
Use of `NF` in awk command - Stack Overflow
https://stackoverflow.com/questions/47216786/use-of-nf-in-awk-command
Why did you post that link to an apparently unrelated question which has
all wrong answers (or incomplete at best - the effect of setting `NF` is
undefined behavior per POSIX and so will do different things in
different awk variants and even in 1 awk variant can behave differently
depending on whether you're setting it to a higher or lower than
original value)?

Please always provide enough context in your posts for us to be able to
understand why you're posting.

Ed.
Aharon Robbins
2024-03-13 09:21:44 UTC
Permalink
Post by Ed Morton
the effect of setting `NF` is
undefined behavior per POSIX and so will do different things in
different awk variants and even in 1 awk variant can behave differently
depending on whether you're setting it to a higher or lower than
original value
This is not true. The effect of setting NF was well defined
by the original awk book and also in POSIX.

Decreasing NF throws away fields. Increasing NF adds the
intervening fields with the null string as their values
and rebuilds the record.
Keith Thompson
2024-03-13 16:22:35 UTC
Permalink
Post by Aharon Robbins
Post by Ed Morton
the effect of setting `NF` is
undefined behavior per POSIX and so will do different things in
different awk variants and even in 1 awk variant can behave differently
depending on whether you're setting it to a higher or lower than
original value
This is not true. The effect of setting NF was well defined
by the original awk book and also in POSIX.
Decreasing NF throws away fields. Increasing NF adds the
intervening fields with the null string as their values
and rebuilds the record.
I don't see that in the POSIX specification.

https://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html
"""
NF
The number of fields in the current record. Inside a BEGIN action,
the use of NF is undefined unless a getline function without a var
argument is executed previously. Inside an END action, NF shall
retain the value it had for the last record read, unless a
subsequent, redirected, getline function without a var argument is
performed prior to entering the END action.
"""

I don't see an explicit statement that assigning to NF has undefined
behavior. The last sentence seems to imply, if taken literally, that
assigning to NF doesn't change its value, at least within an END
section. Perhaps it's merely an oversight, or perhaps I've missed
something.

Do you see something in POSIX that defines the behavior of assigning to
NF?
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
Working, but not speaking, for Medtronic
void Void(void) { Void(); } /* The recursive call of the void */
Kaz Kylheku
2024-03-13 18:24:37 UTC
Permalink
Post by Keith Thompson
Post by Aharon Robbins
Post by Ed Morton
the effect of setting `NF` is
undefined behavior per POSIX and so will do different things in
different awk variants and even in 1 awk variant can behave differently
depending on whether you're setting it to a higher or lower than
original value
This is not true. The effect of setting NF was well defined
by the original awk book and also in POSIX.
Decreasing NF throws away fields. Increasing NF adds the
intervening fields with the null string as their values
and rebuilds the record.
I don't see that in the POSIX specification.
The key is this:

References to nonexistent fields (that is, fields after $NF), shall
evaluate to the uninitialized value.

NF is assignable, and fields after $NF do not exist. Thus if we
have four fields and set NF = 3, then $4 doesn't exist.

That implies it must cease to exist; i.e. be destroyed. If setting NF = 4 were
to restore $4 then that would mean it had continued to exist, but was only
hidden.

The behavior is present in GNU Awk, Mawk, BusyBox Awk and others.

I reproduced the behavior carefully in the awk macro of TXR Lisp:

$ echo '1 2 3 4' | txr -e '(awk (t (set nf 1) (set nf 3) (prn [f 1])))'

$ echo '1 2 3 4' | txr -e '(awk (t (set nf 3) (prn [f 1])))'
2
Post by Keith Thompson
https://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html
"""
NF
The number of fields in the current record. Inside a BEGIN action,
the use of NF is undefined unless a getline function without a var
argument is executed previously. Inside an END action, NF shall
retain the value it had for the last record read, unless a
subsequent, redirected, getline function without a var argument is
performed prior to entering the END action.
This looks defective. The value of NF observed in END must obviously
be the last stored one, however it was stored, whether by assignment
or getline.

Note that NF is also recalculated if $0 is assigned, which is
explicitly required in the document; it is glaringly defective to
be appearing to be making an exception for getline but not for
assignment to $0.
--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @***@mstdn.ca
Keith Thompson
2024-03-13 21:15:56 UTC
Permalink
Post by Kaz Kylheku
Post by Keith Thompson
Post by Aharon Robbins
Post by Ed Morton
the effect of setting `NF` is
undefined behavior per POSIX and so will do different things in
different awk variants and even in 1 awk variant can behave differently
depending on whether you're setting it to a higher or lower than
original value
This is not true. The effect of setting NF was well defined
by the original awk book and also in POSIX.
Decreasing NF throws away fields. Increasing NF adds the
intervening fields with the null string as their values
and rebuilds the record.
I don't see that in the POSIX specification.
References to nonexistent fields (that is, fields after $NF), shall
evaluate to the uninitialized value.
NF is assignable, and fields after $NF do not exist. Thus if we
have four fields and set NF = 3, then $4 doesn't exist.
That describes what happens if NF is modified by assignment, but I don't
see that it implies that such an assignment is allowed.
Post by Kaz Kylheku
That implies it must cease to exist; i.e. be destroyed. If setting NF = 4 were
to restore $4 then that would mean it had continued to exist, but was only
hidden.
The behavior is present in GNU Awk, Mawk, BusyBox Awk and others.
I accept that most, quite possible all, implementations of Awk allow
assignment to NF, with the semantics of dropping fields after $NF or
adding new fields if the value decreases or increases, respectively.
And on the basis of that, I accept that POSIX *should* specify the
behavior of assigning to NF -- especially if the original AWK book
defines it. The second edition briefly mentions modifying NF:
"Conversely, if NF changes, $0 is recomputed when its value is needed."

But I can imagine a hypothetical awk-like language in which assigning to
NF has undefined behavior. My question is, how does the POSIX
specification not describe that language?

Looking more closely at
https://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html
it can be argued that assigning to NF *is* well defined, but it could be
much clearer. The syntax for a simple assignment is:
lvalue '=' expr
where an lvalue is one of:
NAME
NAME '[' expr_list ']'
'$' expr
and:
The token NAME shall consist of a word that is not a keyword or a
name of a built-in function and is not followed immediately (without
any delimiters) by the '(' character.

Which implies that, for example, `NF = 10` is valid.

Also, NF is a "special variable", which weakly implies that it's
assignable.

On the other hand, it also implies that `foo = 42` is valid where `foo`
is the name of a user-defined function (gawk disallows it). It should
say that the name of a user-defined function is not an lvalue.

The POSIX description reads to me as if the authors just didn't think
about whether assigning to NR, or to user-defined function names, should
be permitted. The behavior of adding or removing fields when NR is
modified by assignment is, I suggest, something that should be stated
explicitly.

[...]
Post by Kaz Kylheku
Post by Keith Thompson
https://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html
"""
NF
The number of fields in the current record. Inside a BEGIN action,
the use of NF is undefined unless a getline function without a var
argument is executed previously. Inside an END action, NF shall
retain the value it had for the last record read, unless a
subsequent, redirected, getline function without a var argument is
performed prior to entering the END action.
This looks defective. The value of NF observed in END must obviously
be the last stored one, however it was stored, whether by assignment
or getline.
Note that NF is also recalculated if $0 is assigned, which is
explicitly required in the document; it is glaringly defective to
be appearing to be making an exception for getline but not for
assignment to $0.
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
Working, but not speaking, for Medtronic
void Void(void) { Void(); } /* The recursive call of the void */
Kaz Kylheku
2024-03-13 21:49:26 UTC
Permalink
Post by Keith Thompson
Post by Kaz Kylheku
Post by Keith Thompson
Post by Aharon Robbins
Post by Ed Morton
the effect of setting `NF` is
undefined behavior per POSIX and so will do different things in
different awk variants and even in 1 awk variant can behave differently
depending on whether you're setting it to a higher or lower than
original value
This is not true. The effect of setting NF was well defined
by the original awk book and also in POSIX.
Decreasing NF throws away fields. Increasing NF adds the
intervening fields with the null string as their values
and rebuilds the record.
I don't see that in the POSIX specification.
References to nonexistent fields (that is, fields after $NF), shall
evaluate to the uninitialized value.
NF is assignable, and fields after $NF do not exist. Thus if we
have four fields and set NF = 3, then $4 doesn't exist.
That describes what happens if NF is modified by assignment, but I don't
see that it implies that such an assignment is allowed.
"The left-hand side of an assignment and the target of increment and
decrement operators can be one of a variable, an array with index, or a
field selector."

NF is described as a variable. Some unique remarks are made about NF,
but none deny that it's assignable like any other variable.
Post by Keith Thompson
But I can imagine a hypothetical awk-like language in which assigning to
NF has undefined behavior. My question is, how does the POSIX
specification not describe that language?
That language is failing to support an instance of a variable
being the left operand of an assignment, which a variable "can be".

It looks like the violation of a requirement.
Post by Keith Thompson
On the other hand, it also implies that `foo = 42` is valid where `foo`
is the name of a user-defined function (gawk disallows it).
POSIX does say that "[t]he same name shall not be used as both a
function parameter name and as the name of a function or a special awk
variable." So foo = 42 isn't valid if foo is already a function.

Also: "The same name shall not be used both as a variable name with
global scope and as the name of a function. The same name shall not be
used within the same scope both as a scalar variable and as an array."

All that said, the business of the NF tail wagging the $1, $2, ...
legs of the dog should be the target of at least one clarifying remark,
and the other defects should also be corrected:

- In a BEGIN clause NF should be undefined unless any action
whatsoever is executed that sets its value: direct assignment,
use of getline or assignment to $0.

- At the start of the execution of an END clause, NF retains
its current value (or undefined status, if it was never set);
the END clause has no implicit effect on NF.
--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @***@mstdn.ca
Ed Morton
2024-03-13 23:45:41 UTC
Permalink
Post by Kaz Kylheku
Post by Keith Thompson
Post by Kaz Kylheku
Post by Keith Thompson
Post by Aharon Robbins
Post by Ed Morton
the effect of setting `NF` is
undefined behavior per POSIX and so will do different things in
different awk variants and even in 1 awk variant can behave differently
depending on whether you're setting it to a higher or lower than
original value
This is not true. The effect of setting NF was well defined
by the original awk book and also in POSIX.
Decreasing NF throws away fields. Increasing NF adds the
intervening fields with the null string as their values
and rebuilds the record.
I don't see that in the POSIX specification.
References to nonexistent fields (that is, fields after $NF), shall
evaluate to the uninitialized value.
NF is assignable, and fields after $NF do not exist. Thus if we
have four fields and set NF = 3, then $4 doesn't exist.
That's a bit like the argument from an old episode of the comedy TV show
"Yes, Prime Minister" in the UK where his aide says (paraphrased) "Some
country has done X, we must go something. War is something, therefore we
must go to war".

Being able to set NF to 3 does not mean you must delete $4. Why not
delete $1 or $2 instead? You'd still end up with 3 fields to satisfy the
value of NF. Lots of things you can do are undefined by POSIX despite
how sensible some impacts may seem, assigning a value to NF is just 1
more of them.

You could say that "$0 holds the last record read, you can use $0 in the
END section, therefore in the END section $0 must contain the value of
the last record read". Except that's not true. From the gawk manual
(https://www.gnu.org/software/gawk/manual/html_node/I_002fO-And-BEGIN_002fEND.html#I_002fO-And-BEGIN_002fEND):

----
Most probably due to an oversight, the standard does not say that $0 is
also preserved, although logically one would think that it should be. In
fact, all of BWK awk, mawk, and gawk preserve the value of $0 for use in
END rules. Be aware, however, that some other implementations and many
older versions of Unix awk do not.
----
Post by Kaz Kylheku
Post by Keith Thompson
That describes what happens if NF is modified by assignment, but I don't
see that it implies that such an assignment is allowed.
"The left-hand side of an assignment and the target of increment and
decrement operators can be one of a variable, an array with index, or a
field selector."
NF is described as a variable. Some unique remarks are made about NF,
but none deny that it's assignable like any other variable.
Post by Keith Thompson
But I can imagine a hypothetical awk-like language in which assigning to
NF has undefined behavior. My question is, how does the POSIX
specification not describe that language?
That language is failing to support an instance of a variable
being the left operand of an assignment, which a variable "can be".
It looks like the violation of a requirement.
Post by Keith Thompson
On the other hand, it also implies that `foo = 42` is valid where `foo`
is the name of a user-defined function (gawk disallows it).
POSIX does say that "[t]he same name shall not be used as both a
function parameter name and as the name of a function or a special awk
variable." So foo = 42 isn't valid if foo is already a function.
Also: "The same name shall not be used both as a variable name with
global scope and as the name of a function. The same name shall not be
used within the same scope both as a scalar variable and as an array."
All that said, the business of the NF tail wagging the $1, $2, ...
legs of the dog should be the target of at least one clarifying remark,
- In a BEGIN clause NF should be undefined unless any action
whatsoever is executed that sets its value: direct assignment,
use of getline or assignment to $0.
- At the start of the execution of an END clause, NF retains
its current value (or undefined status, if it was never set);
the END clause has no implicit effect on NF.
All of the above claims that POSIX states you can assign a value to NF.
That may or may not be correct, I expect it is but I don't care because
nothing above nor in the POSIX spec states what the IMPACT is of
assigning a value to NF. As far as I can see there is absolutely nothing
in the POSIX spec that says anything like "if you set NF to a higher
value fields will be created and if you set NF to a lower value fields
will be removed" but I'd honestly love to be proven wrong and shown the
section that does defined the impact of assigning a higher or lower
value to NF.

Ed.
Kaz Kylheku
2024-03-14 00:17:48 UTC
Permalink
Post by Ed Morton
Post by Kaz Kylheku
Post by Keith Thompson
Post by Aharon Robbins
Post by Ed Morton
the effect of setting `NF` is
undefined behavior per POSIX and so will do different things in
different awk variants and even in 1 awk variant can behave differently
depending on whether you're setting it to a higher or lower than
original value
This is not true. The effect of setting NF was well defined
by the original awk book and also in POSIX.
Decreasing NF throws away fields. Increasing NF adds the
intervening fields with the null string as their values
and rebuilds the record.
I don't see that in the POSIX specification.
References to nonexistent fields (that is, fields after $NF), shall
evaluate to the uninitialized value.
NF is assignable, and fields after $NF do not exist. Thus if we
have four fields and set NF = 3, then $4 doesn't exist.
That's a bit like the argument from an old episode of the comedy TV show
"Yes, Prime Minister"
But that show is the reference model for how ISO and IEEE standarization
works.
Post by Ed Morton
in the UK where his aide says (paraphrased) "Some
country has done X, we must go something. War is something, therefore we
must go to war".
Being able to set NF to 3 does not mean you must delete $4.
The passage says that fields do not exist beyond $NF. So if NF
is 3, $4 doesn't exist.
Post by Ed Morton
Why not
delete $1 or $2 instead?
You'd still end up with 3 fields to satisfy the
value of NF.
Because those are less than 3, the value in NF. Those exist.
$2 and $3 exist while NF is originally 4; and continue to
exist if it is decremented to 3. Why would $2 be victimized,
when at no point had NF been less than 2?
--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @***@mstdn.ca
Keith Thompson
2024-03-14 01:34:27 UTC
Permalink
Post by Kaz Kylheku
Post by Keith Thompson
Post by Kaz Kylheku
Post by Keith Thompson
Post by Aharon Robbins
Post by Ed Morton
the effect of setting `NF` is
undefined behavior per POSIX and so will do different things in
different awk variants and even in 1 awk variant can behave differently
depending on whether you're setting it to a higher or lower than
original value
This is not true. The effect of setting NF was well defined
by the original awk book and also in POSIX.
Decreasing NF throws away fields. Increasing NF adds the
intervening fields with the null string as their values
and rebuilds the record.
I don't see that in the POSIX specification.
References to nonexistent fields (that is, fields after $NF), shall
evaluate to the uninitialized value.
NF is assignable, and fields after $NF do not exist. Thus if we
have four fields and set NF = 3, then $4 doesn't exist.
That describes what happens if NF is modified by assignment, but I don't
see that it implies that such an assignment is allowed.
"The left-hand side of an assignment and the target of increment and
decrement operators can be one of a variable, an array with index, or a
field selector."
NF is described as a variable. Some unique remarks are made about NF,
but none deny that it's assignable like any other variable.
OK, I concede. It can be inferred from the POSIX specification that
assigning to NF is allowed.

And the specification is in serious need of a definition of what
assigning to NF actually *does*, other than changing the value of NF.
Post by Kaz Kylheku
Post by Keith Thompson
But I can imagine a hypothetical awk-like language in which assigning to
NF has undefined behavior. My question is, how does the POSIX
specification not describe that language?
That language is failing to support an instance of a variable
being the left operand of an assignment, which a variable "can be".
It looks like the violation of a requirement.
Agreed. I think.

[...]
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
Working, but not speaking, for Medtronic
void Void(void) { Void(); } /* The recursive call of the void */
Kaz Kylheku
2024-03-14 00:22:56 UTC
Permalink
Post by Keith Thompson
That describes what happens if NF is modified by assignment, but I don't
see that it implies that such an assignment is allowed.
Here is a problem. In numerous implementations, when you set NF, not
only does that set the number of fields, but $0 is recomputed.
So instead of $1=$1 you can use NF=NF.

$ echo '1 2 3 4' | awk -v OFS=: '{ NF=NF; print $0; }'
1:2:3:4

$ echo '1 2 3 4' | awk -v OFS=: '{ NF=2; print $0; }'
1:2


We can continue to infer that if setting NF causes certain fields to
exist, and not others, then $0 must be reconstituted accordingly,
just like when a field is assigned, according to the idea that Awk
implements a kind of "reactive programming" paradigm whereby $0
and the fields are kept in sync.

But that's going a little unconfortably far on the proverbial limb,
without assurance from the text.
--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @***@mstdn.ca
Aharon Robbins
2024-03-14 06:19:40 UTC
Permalink
Post by Keith Thompson
Do you see something in POSIX that defines the behavior of assigning to
NF?
In the section "Variables and Special Values"

| References to nonexistent fields (that is, fields after $NF), shall
| evaluate to the uninitialized value. Such references shall not create
| new fields. However, assigning to a nonexistent field (for example,
| $(NF+2)=5) shall increase the value of NF; create any intervening fields
| with the uninitialized value; and cause the value of $0 to be
| recomputed, with the fields being separated by the value of OFS. Each
| field variable shall have a string value or an uninitialized value when
| created.

It doesn't say what happens when you do NF -= 2; nonetheless, all
traditional awks throw away fields when you do something like that.
Keith Thompson
2024-03-14 06:43:25 UTC
Permalink
Post by Aharon Robbins
Post by Keith Thompson
Do you see something in POSIX that defines the behavior of assigning to
NF?
In the section "Variables and Special Values"
| References to nonexistent fields (that is, fields after $NF), shall
| evaluate to the uninitialized value. Such references shall not create
| new fields. However, assigning to a nonexistent field (for example,
| $(NF+2)=5) shall increase the value of NF; create any intervening fields
| with the uninitialized value; and cause the value of $0 to be
| recomputed, with the fields being separated by the value of OFS. Each
| field variable shall have a string value or an uninitialized value when
| created.
It doesn't say what happens when you do NF -= 2; nonetheless, all
traditional awks throw away fields when you do something like that.
Kaz already addressed this. It's not sufficiently explicit about this
behavior, but:

""" Kaz:
The key is this:

References to nonexistent fields (that is, fields after $NF), shall
evaluate to the uninitialized value.

NF is assignable, and fields after $NF do not exist. Thus if we
have four fields and set NF = 3, then $4 doesn't exist.
"""

(At the time I wasn't convinced that POSIX requires NF to be
assignable.)
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
Working, but not speaking, for Medtronic
void Void(void) { Void(); } /* The recursive call of the void */
Ed Morton
2024-03-14 10:50:11 UTC
Permalink
Post by Aharon Robbins
Post by Keith Thompson
Do you see something in POSIX that defines the behavior of assigning to
NF?
In the section "Variables and Special Values"
| References to nonexistent fields (that is, fields after $NF), shall
| evaluate to the uninitialized value. Such references shall not create
| new fields. However, assigning to a nonexistent field (for example,
| $(NF+2)=5) shall increase the value of NF; create any intervening fields
| with the uninitialized value; and cause the value of $0 to be
| recomputed, with the fields being separated by the value of OFS. Each
| field variable shall have a string value or an uninitialized value when
| created.
It doesn't say what happens when you do NF -= 2; nonetheless, all
traditional awks throw away fields when you do something like that.
It doesn't say what happens when you do NF += 2 either. All I'm saying
is that changing the value of NF is undefined behavior per POSIX.

I'm not sure which awks would be considered "traditional" vs otherwise
but AFAIK POSIX is descriptive, i.e. describes how X behaves rather than
dictates the behavior of X, so if the appropriate set of awk variants
all behave the same way for any behavior such as this that's currently
undefined by POSIX (changing the value of NF, the value of $0 in the end
section, and field splitting with a null FS being the 3 most commonly
used cases IMO) then maybe the folks who write that spec could/should
update it to describe that behavior but I don't know which awks all
behave the same way for those cases, nor if that's enough of them for
POSIX to make a definition.

Ed.
Ed Morton
2024-03-14 12:09:59 UTC
Permalink
Post by Ed Morton
Post by Aharon Robbins
Post by Keith Thompson
Do you see something in POSIX that defines the behavior of assigning to
NF?
In the section "Variables and Special Values"
| References to nonexistent fields (that is, fields after $NF), shall
| evaluate to the uninitialized value. Such references shall not create
| new fields. However, assigning to a nonexistent field (for example,
| $(NF+2)=5) shall increase the value of NF; create any intervening fields
| with the uninitialized value; and cause the value of $0 to be
| recomputed, with the fields being separated by the value of OFS. Each
| field variable shall have a string value or an uninitialized value when
| created.
It doesn't say what happens when you do NF -= 2; nonetheless, all
traditional awks throw away fields when you do something like that.
It doesn't say what happens when you do NF += 2 either. All I'm saying
is that changing the value of NF is undefined behavior per POSIX.
I'm not sure which awks would be considered "traditional" vs otherwise
but AFAIK POSIX is descriptive, i.e. describes how X behaves rather than
dictates the behavior of X, so if the appropriate set of awk variants
all behave the same way for any behavior such as this that's currently
undefined by POSIX (changing the value of NF, the value of $0 in the end
section, and field splitting with a null FS being the 3 most commonly
used cases IMO) then maybe the folks who write that spec could/should
update it to describe that behavior but I don't know which awks all
behave the same way for those cases, nor if that's enough of them for
POSIX to make a definition.
    Ed.
I couldn't find any existing tickets so I just created tickets with the
Austin Group to request that definitions for the 3 cases I listed above
be added to the POSIX spec:

1) Changing the value of NF =
https://www.austingroupbugs.net/view.php?id=1820
2) The value of $0, $1, etc. in an END section =
https://www.austingroupbugs.net/view.php?id=1821
3) Splitting using a null field separator =
https://www.austingroupbugs.net/view.php?id=1822

Obviously I've no idea if they'll be implemented or not but AFAIK it
doesn't hurt to ask. I said "in most modern awks..." in each of them, if
anyone knows which specific awks behave in the ways I described (or
which don't) then feel free to comment on the issues if you can, I just
don't have access to multiple awk variants at this time.

Regards,

Ed.
Ed Morton
2024-03-14 12:32:45 UTC
Permalink
Post by Ed Morton
Post by Ed Morton
Post by Aharon Robbins
Post by Keith Thompson
Do you see something in POSIX that defines the behavior of assigning to
NF?
In the section "Variables and Special Values"
| References to nonexistent fields (that is, fields after $NF), shall
| evaluate to the uninitialized value. Such references shall not create
| new fields. However, assigning to a nonexistent field (for example,
| $(NF+2)=5) shall increase the value of NF; create any intervening fields
| with the uninitialized value; and cause the value of $0 to be
| recomputed, with the fields being separated by the value of OFS. Each
| field variable shall have a string value or an uninitialized value when
| created.
It doesn't say what happens when you do NF -= 2; nonetheless, all
traditional awks throw away fields when you do something like that.
It doesn't say what happens when you do NF += 2 either. All I'm saying
is that changing the value of NF is undefined behavior per POSIX.
I'm not sure which awks would be considered "traditional" vs otherwise
but AFAIK POSIX is descriptive, i.e. describes how X behaves rather
than dictates the behavior of X, so if the appropriate set of awk
variants all behave the same way for any behavior such as this that's
currently undefined by POSIX (changing the value of NF, the value of
$0 in the end section, and field splitting with a null FS being the 3
most commonly used cases IMO) then maybe the folks who write that spec
could/should update it to describe that behavior but I don't know
which awks all behave the same way for those cases, nor if that's
enough of them for POSIX to make a definition.
     Ed.
I couldn't find any existing tickets so I just created tickets with the
Austin Group to request that definitions for the 3 cases I listed above
1) Changing the value of NF =
https://www.austingroupbugs.net/view.php?id=1820
2) The value of $0, $1, etc. in an END section =
https://www.austingroupbugs.net/view.php?id=1821
3) Splitting using a null field separator =
https://www.austingroupbugs.net/view.php?id=1822
I just added a final ticket from me for the other undefined behavior I
fairly often see people relying on (e.g. when creating multi-line
records by reading 1 line at a time to handle quoted fields that include
newlines without gawk --csv):

4) Changing the value of NR or FNR =
https://www.austingroupbugs.net/view.php?id=1823
Post by Ed Morton
Obviously I've no idea if they'll be implemented or not but AFAIK it
doesn't hurt to ask. I said "in most modern awks..." in each of them, if
anyone knows which specific awks behave in the ways I described (or
which don't) then feel free to comment on the issues if you can, I just
don't have access to multiple awk variants at this time.
Regards,
    Ed.
Ed Morton
2024-03-13 23:27:30 UTC
Permalink
Post by Aharon Robbins
Post by Ed Morton
the effect of setting `NF` is
undefined behavior per POSIX and so will do different things in
different awk variants and even in 1 awk variant can behave differently
depending on whether you're setting it to a higher or lower than
original value
This is not true. The effect of setting NF was well defined
by the original awk book and also in POSIX.
Decreasing NF throws away fields. Increasing NF adds the
intervening fields with the null string as their values
and rebuilds the record.
Arnold - I don't know about the original awk book but POSIX
(https://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html)
only defines what happens if you populate $X, not what happens if you
populate NF. If you set $X awk rebuilds the record and if X is some
value higher than the current value of NF then awk adds the intervening
fields with the null string as their values, but POSIX doesn't specify
what happens if you set NF to any value.

If I'm wrong about that I'd love for you or anyone else to point me to
the section that defines it as I've scoured the standard several times
looking for it over the years.

Ed.
j***@invalid.invalid
2024-03-01 15:59:59 UTC
Permalink
I am new to Awk programming.
[ 8] SSID[ [HOME]] BSSID[04:9F:xx:xx:xx:xx] channel[ 6]
frequency[2437] numsta[1] rssi[-63] noise[-75] beacon[98] cap[1411]
dtim[0] rate[450] enc[Group-AES-CCMP CCMP PSK2 ]
bssid="04:9F:xx:xx:xx:xx";
ssid[bssid]="[HOME]";
channel[bssid]="6";
frequency[bssid]="2437";
....
rate[bssid]="450;
enc[bssid]="Group-AES-CCMP CCMP PSK2";
Found your issue interesting enough to attempt a solution:


#../sandbox/test.awk
BEGIN { FS="\\[[ []*" ; RS="]" }
{ sub("\n","")
for (i=1; i<=NF; i+=2) {
($i ~ /^$/) ? $i = "Station" : sub(/^ */,"\t",$i)
if ($(i+1) != "")
printf "%s[bssid] = %s\n", $i,$(i+1)
} }

$ nawk -f test.awk test.data
Station[bssid] = 8
SSID[bssid] = HOME
BSSID[bssid] = 04:9F:xx:xx:xx:xx
channel[bssid] = 6
frequency[bssid] = 2437
numsta[bssid] = 1
rssi[bssid] = -63
noise[bssid] = -75
beacon[bssid] = 98
cap[bssid] = 1411
dtim[bssid] = 0
rate[bssid] = 450
enc[bssid] = Group-AES-CCMP CCMP PSK2
Mr. Man-wai Chang
2024-03-01 16:23:22 UTC
Permalink
I am new to Awk programming.
Being new to Awk programming, I am amazed to learn that Awk can
automaticlaly use a string as an array index. There is also automatic
type-conversion. Very much like Visual Foxpro and other dBase dialects I
am more fluent with! :)

But all dBase dialects cannot directly use a string as an array index.
You can work around it using macro substitution, but it's not direct
solution like Awk array.
Loading...