Discussion:
"sed" question
(too old to reply)
Grant Taylor
2024-03-08 02:38:28 UTC
Permalink
I know that's what awk does, but I don't think I would have expected
it if I didn't know about it.
Okay. I think that's a fair observation.
$0 is the current input line.
Or $0 is the current /record/ in awk parlance.
If you don't change anything, or if you modify $0 itself, whitespace
betweeen fields is preserved.
If you modify any of the fields, $0 is recomputed and whitespace
between tokens is collapsed.
I don't agree with that.

% echo 'one two three' | awk '{print $0; print $1,$2,$3}'
one two three
one two three

I didn't /modify/ anything and awk does print the fields with different
white space.
awk *could* have been defined to preserve inter-field whitespace even
when you modify individual fields,
I question the veracity of that. Specifically when lengthening or
shortening the value of a field. E.g. replacing "two" with "fifteen".
This is particularly germane when you look at $0 as a fixed width
formatted output.
and I think I would have found that more intuitive.
I don't agree.
(And ideally there would be a way to refer to that inter-field
whitespace.)
Remember, awk is meant for working on fields of data in a record. By
default, the fields are delimited by white space characters. I'll say
it this way, awk is meant for working on the non-white space characters.
Or yet another way, awk is not meant for working on white space charters.
The fact that modifying a field has the side effect of messing up $0
seems counterintuitive.
Maybe.

But I think it's one that is acceptable for what awk is intended to do.
Perhaps the behavior matches your intuition better than it matches
mine.
I sort of feel like you are wanting to / trying to use awk in places
where sed might be better. sed just sees a string of text and is
ignorant of any structure without a carefully crafted RE to provide it.

Conversely awk is quite happy working with an easily identified field
based on the count with field separators of one or more white space
characters.

Consider the output of `netstat -an` wherein you have multiple columns
of IP addresses.

Please find a quick way, preferably that doesn't involve negation
(because what needs to be negated may bey highly dynamic) that lists
inbound SMTP connections on an email server but doesn't list outbound
SMTP connections.

awk makes it trivial to identify and print records that have the SMTP
port in the local IP column, thus ignoring outbound connections with
SMTP in the remote column.

Aside: Yes, I know that ss and the likes have more features for this,
but this is my example and ss is not installed everywhere.

I sort of view awk as somewhat akin to SQL wherein fields in awk are
like columns in SQL.

I'd be more than a little bit surprised to find an SQL interface that
preserved white space /between/ columns. -- Many will do it /within/
columns.

awk makes it trivial to take field oriented output from commands and
apply some logic / parsing / action on specific fields in records.
(And perhaps this should be moved to comp.lang.awk if it doesn't die
out soon.
comp.lang.awk added and followup pointed there.
Though both sed and awk are both languages in their own right
and tools that can be used from the shell, so I'd argue there's a
topicality overlap.)
;-)
--
Grant. . . .
Keith Thompson
2024-03-08 04:06:00 UTC
Permalink
Post by Grant Taylor
I know that's what awk does, but I don't think I would have expected
it if I didn't know about it.
Okay. I think that's a fair observation.
$0 is the current input line.
Or $0 is the current /record/ in awk parlance.
Yes.
Post by Grant Taylor
If you don't change anything, or if you modify $0 itself, whitespace
betweeen fields is preserved.
If you modify any of the fields, $0 is recomputed and whitespace
between tokens is collapsed.
I don't agree with that.
% echo 'one two three' | awk '{print $0; print $1,$2,$3}'
one two three
one two three
I didn't /modify/ anything and awk does print the fields with
different white space.
That's just the semantics of print with comma-delimited arguments, just
like:

% awk 'BEGIN{a="foo"; b="bar"; print a, b}'
foo bar

Printing the values of $1, $2, and $3 doesn't change $0. Writing to any
of $1, $2, $3, even with the same value, does change $0.

$ echo 'one two three' | awk '{print $0; print $1,$2,$3; print $0; $2 = $2; print $0}'
one two three
one two three
one two three
one two three
Post by Grant Taylor
awk *could* have been defined to preserve inter-field whitespace
even when you modify individual fields,
I question the veracity of that. Specifically when lengthening or
shortening the value of a field. E.g. replacing "two" with
"fifteen". This is particularly germane when you look at $0 as a fixed
width formatted output.
But awk doesn't work with fixed-width data. The length of each field,
and the length of $0, is variable.

If awk *purely* dealt with input lines only as lists of tokens, then
this:

echo 'one two three' | awk '{print $0}'

would print "one two three" rather than "one two three" (and awk would
lose the ability to deal with arbitrarily formatted input). The fact
that the inter-field whitespace is reset only when individual fields are
touched feels arbitrary to me.
Post by Grant Taylor
and I think I would have found that more intuitive.
I don't agree.
(And ideally there would be a way to refer to that inter-field
whitespace.)
Remember, awk is meant for working on fields of data in a record. By
default, the fields are delimited by white space characters. I'll say
it this way, awk is meant for working on the non-white space
characters. Or yet another way, awk is not meant for working on
white space charters.
Awk has strong builtin support for working on whitespace-delimited
fields, and that support tends to ignore the details of that whitespace.
But you can also write awk code that just deals with $0.

One trivial example:

awk '{ count += length + 1 } END { print count }'

behaves similarly to `wc -l`, and counts whitespace characters just like
any other characters.
Post by Grant Taylor
The fact that modifying a field has the side effect of messing up $0
seems counterintuitive.
Maybe.
But I think it's one that is acceptable for what awk is intended to do.
It's also the existing behavior, and changing it would break things, so
I wouldn't suggest changing it.
Post by Grant Taylor
Perhaps the behavior matches your intuition better than it matches
mine.
I sort of feel like you are wanting to / trying to use awk in places
where sed might be better. sed just sees a string of text and is
ignorant of any structure without a carefully crafted RE to provide it.
Not really. I'm just remarking on one particular awk feature that I
find a bit counterintuitive.

Awk is optimized for working on records consisting of fields, and not
caring much about how much whitespace there is between fields. But it's
flexible enought to do *lots* of other things.

[...]
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
Working, but not speaking, for Medtronic
void Void(void) { Void(); } /* The recursive call of the void */
Mr. Man-wai Chang
2024-03-08 09:03:19 UTC
Permalink
Post by Keith Thompson
Not really. I'm just remarking on one particular awk feature that I
find a bit counterintuitive.
Awk is optimized for working on records consisting of fields, and not
caring much about how much whitespace there is between fields. But it's
flexible enought to do *lots* of other things.
The original Awk doesn't support regular expressions, right? Because
regex was not yet talked about back then??
Janis Papanagnou
2024-03-08 14:46:20 UTC
Permalink
Post by Mr. Man-wai Chang
The original Awk doesn't support regular expressions, right?
Where did you get that from? - Awk without regexps makes little sense;
mind that the basic syntax of Awk programs is described as
/pattern/ { action }
What would remain if there's no regexp patterns; string comparisons?
Post by Mr. Man-wai Chang
Because regex was not yet talked about back then??
Stable Awk (1985) was released 1987. The (initial) old Awk (1977) was
released 1979. Before that tool we had Sed (1974), and before that we
had Ed and Grep (1973). My perception is that regexps were there as a
basic concept of UNIX in all these tools, so why should Awk be exempt.
According to the authors Awk was designed to see how Sed and Grep could
be generalized.

Janis
Grant Taylor
2024-03-08 15:12:05 UTC
Permalink
Post by Janis Papanagnou
Awk without regexps makes little sense;
I think this comes down to what is a regular expression and what is not
a regular expression.
Post by Janis Papanagnou
mind that the basic syntax of Awk programs is described as
pattern { action }
I'm guessing that 40-60% of the awk that I use doesn't use what I would
consider to be regular expressions.

(NF == 5){print $3}
(NF == 8){print $4}

Or:

{total+=$5}
END{print total}

I usually think of regular expressions when I'm doing a sub(/re/, ...)
type thing or a (... ~ /re/) type conditional. More specifically things
between the // in both of those statements are the REs.

Maybe I have an imprecise understanding / definition.
--
Grant. . . .
Mr. Man-wai Chang
2024-03-08 15:59:42 UTC
Permalink
Post by Grant Taylor
I usually think of regular expressions when I'm doing a sub(/re/, ...)
type thing or a (... ~ /re/) type conditional. More specifically things
between the // in both of those statements are the REs.
Maybe I have an imprecise understanding / definition.
Do Linux and Unix have a ONE AND ONLY ONE STANDARD regex library?

It seemed that tools and programming languages have their own
implementions, let alone different versions among them.
Geoff Clare
2024-03-12 13:47:09 UTC
Permalink
Post by Mr. Man-wai Chang
Do Linux and Unix have a ONE AND ONLY ONE STANDARD regex library?
It seemed that tools and programming languages have their own
implementions, let alone different versions among them.
In the POSIX/UNIX standard the functions used for handling regular
expressions are regcomp() and regexec() (and regerror() and regfree()).
They are in the C library, not a separate "regex library".

They support different RE flavours via flags. The standard requires
that "basic regular expressions" (default) and "extended regular
expressions" (with REG_EXTENDED flag) are supported. Implementations
can support other flavours with non-standard flags.

POSIX requires that awk uses extended regular expressions (i.e. the
same as regcomp() with REG_EXTENDED).
--
Geoff Clare <***@gclare.org.uk>
Aharon Robbins
2024-03-12 19:00:13 UTC
Permalink
Post by Geoff Clare
POSIX requires that awk uses extended regular expressions (i.e. the
same as regcomp() with REG_EXTENDED).
There is the additional requirement that \ inside [....] can
be used to escape characters, so that [abc\]def] is valid in
awk but not in other uses of REG_EXTENDED.
Geoff Clare
2024-03-13 13:57:33 UTC
Permalink
Post by Aharon Robbins
Post by Geoff Clare
POSIX requires that awk uses extended regular expressions (i.e. the
same as regcomp() with REG_EXTENDED).
There is the additional requirement that \ inside [....] can
be used to escape characters,
Yes, awk effectively has an extra "layer" of backslash escaping
before the ERE rules kick in, both inside and outside [....].
I didn't mention this so as not to overload the OP with
information - he seemed more interested in the different flavours
of RE than in nitty gritty details like that.
--
Geoff Clare <***@gclare.org.uk>
Janis Papanagnou
2024-03-09 02:07:24 UTC
Permalink
Post by Grant Taylor
Post by Janis Papanagnou
Awk without regexps makes little sense;
I think this comes down to what is a regular expression and what is not
a regular expression.
Post by Janis Papanagnou
mind that the basic syntax of Awk programs is described as
pattern { action }
I'm guessing that 40-60% of the awk that I use doesn't use what I would
consider to be regular expressions.
[...]
Maybe I have an imprecise understanding / definition.
Your definition matches the common naming, where I deliberately
deviate from. (I think that "pattern" is an inferior naming and
"condition" should better be used, where a 'condition' can also
be a regexp that I regularly write as '/regexp/' or '/pattern/'
in explanations.) So I agree that it's likely that this alone
doesn't serve well as explanation for the existence of regexps
in Awk. The rationale is better seen in the statement "Awk was
designed to see how Sed and Grep could be generalized." that I
quoted (not literally, but from the original Awk book).

Janis
Mr. Man-wai Chang
2024-03-08 15:56:38 UTC
Permalink
Post by Janis Papanagnou
Post by Mr. Man-wai Chang
The original Awk doesn't support regular expressions, right?
Where did you get that from? - Awk without regexps makes little sense;
mind that the basic syntax of Awk programs is described as
/pattern/ { action }
What would remain if there's no regexp patterns; string comparisons?
Post by Mr. Man-wai Chang
Because regex was not yet talked about back then??
Stable Awk (1985) was released 1987. The (initial) old Awk (1977) was
released 1979. Before that tool we had Sed (1974), and before that we
had Ed and Grep (1973). My perception is that regexps were there as a
basic concept of UNIX in all these tools, so why should Awk be exempt.
According to the authors Awk was designed to see how Sed and Grep could
be generalized.
That part of history is beyond me. Sorry... my fault for not doing a check.
Janis Papanagnou
2024-03-09 02:15:00 UTC
Permalink
Post by Mr. Man-wai Chang
Post by Janis Papanagnou
Stable Awk (1985) was released 1987. The (initial) old Awk (1977) was
released 1979. Before that tool we had Sed (1974), and before that we
had Ed and Grep (1973). My perception is that regexps were there as a
basic concept of UNIX in all these tools, so why should Awk be exempt.
According to the authors Awk was designed to see how Sed and Grep could
be generalized.
That part of history is beyond me. Sorry... my fault for not doing a check.
The mistake may stem from a myth (I heard it before already); it may
have been misinterpreted where it's said that in the first Awk there
was no match function (which is true, but it means the concrete match()
function not the abstract function of a (regexp) pattern match).

Janis
Kaz Kylheku
2024-03-08 09:38:50 UTC
Permalink
Post by Keith Thompson
But awk doesn't work with fixed-width data. The length of each field,
and the length of $0, is variable.
GNU Awk, however, can. It has a FIELDWIDTHS variable where you can
specify column widths that then work instead of the FS field separator.

There is also FPAT (see below).
Post by Keith Thompson
If awk *purely* dealt with input lines only as lists of tokens, then
echo 'one two three' | awk '{print $0}'
would print "one two three" rather than "one two three" (and awk would
lose the ability to deal with arbitrarily formatted input). The fact
that the inter-field whitespace is reset only when individual fields are
touched feels arbitrary to me.
There is no inter-field whitespace.

There is the original record in $0, and parsed out fields in $1, $2, ..

The fields don't have any space.

The space comes from the value of the OFS variable.

$ echo 'one two three' | awk -v OFS=: '{ $1=$1; print }'
one:two:three

GNU Awk has a FPAT mechanism by which we can specify the positive
regex for recognizing fields as tokens. By means of that, we can
save the separating whitespace, turning it into a field:

$ echo 'one two three' | \
awk -v FPAT='[^ ]+|[ ]+' -v OFS= \
'{ $1=$1; print; print NF }'
one two three
5

There you go. We now have 5 fields. The interstitial space is a field.
We set OFS to empty and so $1=$1 doesn't collapse the separation.
Post by Keith Thompson
Not really. I'm just remarking on one particular awk feature that I
find a bit counterintuitive.
The proposed feature of preserving the whitespace separation is a niche
use case in relation to Awk's orientation toward tabular data.

In tabular data that is not formatted into nice columns for a monospaced
font, the whitespace doesn't matter. Awk's behavior is that it will
normalize the separation.

In tabular data that is aligned visually, preserving the whitespace
will not work, if any of your field edits change a field width.

I'm believe that your niche use case has a value though.

That's why, in the TXR Lisp Awk macro, I implemented something which
helps with that use case: the "kfs" variable (keep field separators).
This Boolean variable, if true, causes the separating character
sequences to be retained, and turned into fields.

$ echo 'one two three' | txr -e '(awk (:set kfs t) (t))'
one two three

We can see the list f instead, printed machine readably:

$ echo 'one two three' | txr -e '(awk (:set kfs t) (t (prinl f)))'
("" "one" " " "two" " " "three" "")

There is a leading and trailing empty separator. There is a
very good reason for that, in that the default strategy in Awk
is that, for instance " a b c " produces three fields.
For consistency, if we retain separation, we should always have
five fields where there woudl be three.

My FPAT approach above in the GNU Awk example won't do this correctly;
more work is needed.

This kfs variable is not so recent; I implemented in it 2016.
Christian Weisgerber
2024-03-09 12:27:05 UTC
Permalink
$ awk '{print $1, "1-1"}' newsrc-news.eternal-september.org-test >
newsrc-news.eternal-september.org
In this specific case of regular data you can simplify that to
awk '$2="1-1"' sourcefile > targetfile
That had me scratching my head. You can't have an action without
enclosing braces. But it's still legal syntax because... it's an
expression serving as a pattern. The assignment itself is a side
effect.

Care needs to be taken when using this shortcut so the expression
doesn't evalute as false:

$ printf 'one 1\ntwo 2\nthree 3\n' | awk '$2=4'
one 4
two 4
three 4
$ printf 'one 1\ntwo 2\nthree 3\n' | awk '$2=0'
$

$ printf 'one 1\ntwo 2\nthree 3\n' | awk '$2="4"'
one 4
two 4
three 4
$ printf 'one 1\ntwo 2\nthree 3\n' | awk '$2=""'
$
--
Christian "naddy" Weisgerber ***@mips.inka.de
Julieta Shem
2024-03-09 14:52:09 UTC
Permalink
Post by Christian Weisgerber
$ awk '{print $1, "1-1"}' newsrc-news.eternal-september.org-test >
newsrc-news.eternal-september.org
In this specific case of regular data you can simplify that to
awk '$2="1-1"' sourcefile > targetfile
That had me scratching my head. You can't have an action without
enclosing braces. But it's still legal syntax because... it's an
expression serving as a pattern. The assignment itself is a side
effect.
Without braces, the default action takes place, which is ``{print}''.
Kenny McCormack
2024-03-09 15:40:04 UTC
Permalink
Post by Julieta Shem
Post by Christian Weisgerber
$ awk '{print $1, "1-1"}' newsrc-news.eternal-september.org-test >
newsrc-news.eternal-september.org
In this specific case of regular data you can simplify that to
awk '$2="1-1"' sourcefile > targetfile
That had me scratching my head. You can't have an action without
enclosing braces. But it's still legal syntax because... it's an
expression serving as a pattern. The assignment itself is a side
effect.
Without braces, the default action takes place, which is ``{print}''.
Somehow, I think Christian knows that (since everybody knows that).

My guess is that he just doesn't like it...
--
"Everything Roy (aka, AU8YOG) touches turns to crap."
--citizens of alt.obituaries--
Ed Morton
2024-03-09 16:52:31 UTC
Permalink
Post by Christian Weisgerber
$ awk '{print $1, "1-1"}' newsrc-news.eternal-september.org-test >
newsrc-news.eternal-september.org
In this specific case of regular data you can simplify that to
awk '$2="1-1"' sourcefile > targetfile
That had me scratching my head. You can't have an action without
enclosing braces. But it's still legal syntax because... it's an
expression serving as a pattern. The assignment itself is a side
effect.
Care needs to be taken when using this shortcut so the expression
About 20 or so years ago we had a discussion in this NG (which I'm not
going to search for now) and, shockingly, a consensus was reached that
we should encourage people to always write:

'{$2="1-1"} 1'

instead of:

$2="1-1"

unless they NEED the result of the action to be evaluated as a
condition, for that very reason.

Ed.
Post by Christian Weisgerber
$ printf 'one 1\ntwo 2\nthree 3\n' | awk '$2=4'
one 4
two 4
three 4
$ printf 'one 1\ntwo 2\nthree 3\n' | awk '$2=0'
$
$ printf 'one 1\ntwo 2\nthree 3\n' | awk '$2="4"'
one 4
two 4
three 4
$ printf 'one 1\ntwo 2\nthree 3\n' | awk '$2=""'
$
Janis Papanagnou
2024-03-09 20:07:00 UTC
Permalink
Post by Ed Morton
About 20 or so years ago we had a discussion in this NG (which I'm not
going to search for now) and, shockingly, a consensus was reached that
'{$2="1-1"} 1'
I don't recall such a "consensus". If you want to avoid cryptic code
you'd rather write

'{$2="1-1"; print}'

Don't you think?

And of course add more measures in case the data is not as regular as
the sample data suggests. (See my other postings what may be defined
as data, line missing or spurious blanks in the data, comment lines
or empty lines that have to be preserved, etc.)
Post by Ed Morton
$2="1-1"
Janis
Kaz Kylheku
2024-03-09 20:49:43 UTC
Permalink
Post by Janis Papanagnou
Post by Ed Morton
About 20 or so years ago we had a discussion in this NG (which I'm not
going to search for now) and, shockingly, a consensus was reached that
'{$2="1-1"} 1'
I don't recall such a "consensus". If you want to avoid cryptic code
you'd rather write
'{$2="1-1"; print}'
Don't you think?
I don't remember it either, but it's a no brainer that '$2=expr'
is incorrect if expr is arbitrary, and the intent is that
the implicit print is to be unconditionally invoked.

If expr is a nonblank, nonzero literal term, then the assignment
is obviously true and '$2=literal' as the entire program is a fine
idiom.

I don't agree with putting braces around it and adding 1, or explicit
print.

You are not then using Awk like it was meant to be.

When Awk was conceived, the authors peered into a crystal ball and saw
Perl. After the laughter died down, they got serious and made sure to
provide for idioms like:

awk '!s[$0]++'
--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @***@mstdn.ca
Ed Morton
2024-03-12 22:49:09 UTC
Permalink
Post by Janis Papanagnou
Post by Ed Morton
About 20 or so years ago we had a discussion in this NG (which I'm not
going to search for now) and, shockingly, a consensus was reached that
'{$2="1-1"} 1'
I don't recall such a "consensus".
I do, I have no reason to lie about it, but I can't be bothered
searching through 20-year-old usenet archives for it (I did take a very
quick shot at it but I don't even know how to write a good search for it
- you can't just google "awk '1'" and I'm not even sure if it was in
comp.lang.awk or comp.unix.shell).
Post by Janis Papanagnou
If you want to avoid cryptic code you'd rather write
'{$2="1-1"; print}'
Don't you think?
If I'm writing a multi-line script I use an explicit `print` but it just
doesn't matter for a tiny one-line script like that. Everyone using awk
needs to know the `1` idiom as it's so common and once you've seen it
once it's not hard to figure out what `{$2="1-1"} 1` does.

By changing `condition` to `{condition}1` we just add 3 chars to remove
the guesswork from anyone reading it in future and protect against
unconsidered values so we don't just make it less cryptic but also less
fragile.

For example, lets say someone wants to copy the $1 value into $3 and
print every line:

$ printf '1 2 3\n4 5 7\n' | awk '{$3=$1}1'
1 2 1
4 5 4

$ printf '1 2 3\n0 5 7\n' | awk '{$3=$1}1'
1 2 1
0 5 0

$ printf '1 2 3\n4 5 7\n' | awk '$3=$1'
1 2 1
4 5 4

$ printf '1 2 3\n0 5 7\n' | awk '$3=$1'
1 2 1

Note the 2nd line is undesirably (because I wrote the requirements)
missing from that last output.

It happens ALL the time that people don't consider all possible input
values so it's safer to just write the code that reflects your intent
and if you intend for every line to be printed then write code that will
print every line.

Ed.
Post by Janis Papanagnou
And of course add more measures in case the data is not as regular as
the sample data suggests. (See my other postings what may be defined
as data, line missing or spurious blanks in the data, comment lines
or empty lines that have to be preserved, etc.)
Post by Ed Morton
$2="1-1"
Janis
Janis Papanagnou
2024-03-14 02:01:14 UTC
Permalink
Post by Ed Morton
Post by Janis Papanagnou
Post by Ed Morton
About 20 or so years ago we had a discussion in this NG (which I'm not
going to search for now) and, shockingly, a consensus was reached that
'{$2="1-1"} 1'
I don't recall such a "consensus".
I do, I have no reason to lie about it, but I can't be bothered
searching through 20-year-old usenet archives for it (I did take a very
quick shot at it but I don't even know how to write a good search for it
- you can't just google "awk '1'" and I'm not even sure if it was in
comp.lang.awk or comp.unix.shell).
I didn't say anything about "lying"; why do you insinuate so?

But your memory may mislead you. (Or mine, or Kaz', of course.)

(And no, I don't do the search for you; since you have been the
one contending something here.)

Without a reference such a statement is just void (and not more
than a rhetorical move).

You should at least elaborate on the details and facts of that
"consensus" - but for the _specific OP context_ (not for made
up cases).
Post by Ed Morton
Post by Janis Papanagnou
If you want to avoid cryptic code you'd rather write
'{$2="1-1"; print}'
Don't you think?
If I'm writing a multi-line script I use an explicit `print` but it just
doesn't matter for a tiny one-line script like that.
Actually, for the given case, the yet better solution is what the
OP himself said (in CUS, where his question was initially posted):

Grant Taylor on alt.comp.software.thunderbird suggested [...]:
$ awk '{print $1, "1-1"}'

Since this suggestion doesn't overwrite fields and is conceptually
clear. It inherently also handles (possible?) cases where there's
more than two fields in the data (e.g. by spurious blanks).
Post by Ed Morton
Everyone using awk
needs to know the `1` idiom as it's so common and once you've seen it
once it's not hard to figure out what `{$2="1-1"} 1` does.
The point is that $2="1-1" as condition is also an Awk idiom.
Post by Ed Morton
By changing `condition` to `{condition}1` we just add 3 chars to remove
the guesswork from anyone reading it in future and protect against
unconsidered values so we don't just make it less cryptic but also less
fragile.
Your examples below are meaningless since you make up cases that have
nothing to do with the situation here, and especially in context of
my posting saying clearly: "In this specific case of regular data".

The more problematic issue is that $2="1-1" and also {$2="1-1"}
both overwrite fields and thus a reorganization of the fields is
done which has - probably unexpected by a newbie coder - side effects.

But YMMV, of course.

Janis
Post by Ed Morton
For example, lets say someone wants to copy the $1 value into $3 and
$ printf '1 2 3\n4 5 7\n' | awk '{$3=$1}1'
1 2 1
4 5 4
$ printf '1 2 3\n0 5 7\n' | awk '{$3=$1}1'
1 2 1
0 5 0
$ printf '1 2 3\n4 5 7\n' | awk '$3=$1'
1 2 1
4 5 4
$ printf '1 2 3\n0 5 7\n' | awk '$3=$1'
1 2 1
Note the 2nd line is undesirably (because I wrote the requirements)
missing from that last output.
It happens ALL the time that people don't consider all possible input
values so it's safer to just write the code that reflects your intent
and if you intend for every line to be printed then write code that will
print every line.
Ed.
Post by Janis Papanagnou
And of course add more measures in case the data is not as regular as
the sample data suggests. (See my other postings what may be defined
as data, line missing or spurious blanks in the data, comment lines
or empty lines that have to be preserved, etc.)
Post by Ed Morton
$2="1-1"
Janis
Ed Morton
2024-03-14 10:30:16 UTC
Permalink
Post by Janis Papanagnou
Post by Ed Morton
Post by Janis Papanagnou
Post by Ed Morton
About 20 or so years ago we had a discussion in this NG (which I'm not
going to search for now) and, shockingly, a consensus was reached that
'{$2="1-1"} 1'
You snipped the important end of my statement. What I said was:

----
we should encourage people to always write:

'{$2="1-1"} 1'

instead of:

$2="1-1"

unless they NEED the result of the action to be evaluated as a condition
----

The "unless they NEED to" is important given your statement below about
$2="1-1" being an awk idiom which I'll address below.
Post by Janis Papanagnou
Post by Ed Morton
Post by Janis Papanagnou
I don't recall such a "consensus".
I do, I have no reason to lie about it, but I can't be bothered
searching through 20-year-old usenet archives for it (I did take a very
quick shot at it but I don't even know how to write a good search for it
- you can't just google "awk '1'" and I'm not even sure if it was in
comp.lang.awk or comp.unix.shell).
I didn't say anything about "lying"; why do you insinuate so?
I don't insinuate so but I'm not sure if you're arguing that what I said
is not good general advice or that it is good advice but you don't
believe the discussion I referred to happened so I'm just eliminating
one of the possibilities for why I'd say it happened = either it
happened and I remember it, or it didn't happen and I think it did, or
it didn't happen and I'm lying about it. I can rule out that I'm lying
about it and I'd like to think it's more likely it happened and I
remember it than that it didn't happen and I dreamed it up.
Post by Janis Papanagnou
But your memory may mislead you. (Or mine, or Kaz', of course.)
(And no, I don't do the search for you; since you have been the
one contending something here.)
Finding the discussion wouldn't be for me. I know it happened, I know
how to write such code, I've provide an example of why it's good advice,
and everyone else can do whatever they like with that information. I'd
have liked to provide the discussion for reference but couldn't find it.
Oh well.
Post by Janis Papanagnou
Without a reference such a statement is just void (and not more
than a rhetorical move).
It maybe would be if I didn't have a good reputation for awk knowledge
on this and other forums over the past 30 years or so and/or hadn't
provided an example of why it's good general advice.
Post by Janis Papanagnou
You should at least elaborate on the details and facts of that
"consensus" - but for the _specific OP context_ (not for made
up cases).
I have nothing to elaborate with. I remember a discussion about 20 years
ago and I remember the conclusion. That's all. Asking for a statement
related to the OPs specific code is like if I had said "we should
encourage people to always quote their shell variables unless they NEED
the shell to perform globbing, etc. on it" and you asked for an impact
statement of not doing so for `var=7; echo $var`. A specific piece of
code not breaking doesn't invalidate good general advice.
Post by Janis Papanagnou
Post by Ed Morton
Post by Janis Papanagnou
If you want to avoid cryptic code you'd rather write
'{$2="1-1"; print}'
Don't you think?
If I'm writing a multi-line script I use an explicit `print` but it just
doesn't matter for a tiny one-line script like that.
Actually, for the given case, the yet better solution is what the
$ awk '{print $1, "1-1"}'
Since this suggestion doesn't overwrite fields and is conceptually
clear. It inherently also handles (possible?) cases where there's
more than two fields in the data (e.g. by spurious blanks).
Of course.
Post by Janis Papanagnou
Post by Ed Morton
Everyone using awk
needs to know the `1` idiom as it's so common and once you've seen it
once it's not hard to figure out what `{$2="1-1"} 1` does.
The point is that $2="1-1" as condition is also an Awk idiom.
That's like saying `echo $var` is a shell idiom. There are times when
you need to do it, i.e. when you want the shell to perform globbing,
word splitting, and filename generation on `$var` but it wouldn't
invalidate the good general advice that "we should encourage people to
always quote their shell variables unless they NEED the shell to perform
globbing, etc. on it"
Post by Janis Papanagnou
Post by Ed Morton
By changing `condition` to `{condition}1` we just add 3 chars to remove
the guesswork from anyone reading it in future and protect against
unconsidered values so we don't just make it less cryptic but also less
fragile.
Your examples below are meaningless since you make up cases that have
nothing to do with the situation here, and especially in context of
my posting saying clearly: "In this specific case of regular data".
That's again like arguing that if I had said "we should encourage people
to always quote their shell variables unless they NEED the shell to
perform globbing, etc. on it" it'd be meaningless if the OPs sample code
in this specific case didn't break without quotes.
Post by Janis Papanagnou
The more problematic issue is that $2="1-1" and also {$2="1-1"}
both overwrite fields and thus a reorganization of the fields is
done which has - probably unexpected by a newbie coder - side effects.
That's not an issue if the OPs intent is to replace all strings that
match FS with OFS, which I've no idea if they want to do or not, but if
it is an issue it's completely different one unrelated to by statement I
made, and which we've been discussing, that we should encourage people
to always write `{$2="1-1"} 1` instead of just `$2="1-1"` unless they
NEED the result of the action to be evaluated as a condition.

Ed.
Post by Janis Papanagnou
But YMMV, of course.
Janis
Post by Ed Morton
For example, lets say someone wants to copy the $1 value into $3 and
$ printf '1 2 3\n4 5 7\n' | awk '{$3=$1}1'
1 2 1
4 5 4
$ printf '1 2 3\n0 5 7\n' | awk '{$3=$1}1'
1 2 1
0 5 0
$ printf '1 2 3\n4 5 7\n' | awk '$3=$1'
1 2 1
4 5 4
$ printf '1 2 3\n0 5 7\n' | awk '$3=$1'
1 2 1
Note the 2nd line is undesirably (because I wrote the requirements)
missing from that last output.
It happens ALL the time that people don't consider all possible input
values so it's safer to just write the code that reflects your intent
and if you intend for every line to be printed then write code that will
print every line.
Ed.
Post by Janis Papanagnou
And of course add more measures in case the data is not as regular as
the sample data suggests. (See my other postings what may be defined
as data, line missing or spurious blanks in the data, comment lines
or empty lines that have to be preserved, etc.)
Post by Ed Morton
$2="1-1"
Janis
Mr. Man-wai Chang
2024-03-11 11:38:42 UTC
Permalink
Post by Ed Morton
About 20 or so years ago we had a discussion in this NG (which I'm not
going to search for now) and, shockingly, a consensus was reached that
'{$2="1-1"} 1'
$2="1-1"
unless they NEED the result of the action to be evaluated as a
condition, for that very reason.
You might Google about it, but Google has unplugged its Usenet support.
I dunno whether you could search old Usenet messages. There is still
Wayback Machine archive.

awk '{$2="1-1"} 1' - Google Search
<https://www.google.com/search?q=awk+%27%7B%242%3D%221-1%22%7D+1%27>

Internet Archive: Wayback Machine
<https://archive.org/web/>

Google Groups ditches links to Usenet, the OG social network • The Register
<https://www.theregister.com/2023/12/18/google_ends_usenet_links/>
Keith Thompson
2024-03-11 15:09:06 UTC
Permalink
Post by Mr. Man-wai Chang
Post by Ed Morton
About 20 or so years ago we had a discussion in this NG (which I'm not
going to search for now) and, shockingly, a consensus was reached that
'{$2="1-1"} 1'
$2="1-1"
unless they NEED the result of the action to be evaluated as a
condition, for that very reason.
You might Google about it, but Google has unplugged its Usenet
support. I dunno whether you could search old Usenet messages. There
is still Wayback Machine archive.
[...]

You might Google about it yourself before posting.

Google Groups has shut down its Usenet interface, but messages posted
before 2024-02-22 are still available.
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
Working, but not speaking, for Medtronic
void Void(void) { Void(); } /* The recursive call of the void */
Janis Papanagnou
2024-03-09 20:00:02 UTC
Permalink
Post by Christian Weisgerber
$ awk '{print $1, "1-1"}' newsrc-news.eternal-september.org-test >
newsrc-news.eternal-september.org
In this specific case of regular data you can simplify that to
awk '$2="1-1"' sourcefile > targetfile
That had me scratching my head.
Part of the joy programming in Awk. ;-)
Post by Christian Weisgerber
You can't have an action without
enclosing braces. But it's still legal syntax because...
it's an expression serving as a pattern.
This is the key observation!

Here we have only a condition in the general condition { action }
Post by Christian Weisgerber
The assignment itself is a side effect.
Assignments generally have a side effect, inherently. :-)
Post by Christian Weisgerber
Care needs to be taken when using this shortcut so the expression
I've carefully formulated "In this specific case of regular data ..."
Post by Christian Weisgerber
$ printf 'one 1\ntwo 2\nthree 3\n' | awk '$2=4'
one 4
two 4
three 4
$ printf 'one 1\ntwo 2\nthree 3\n' | awk '$2=0'
$
$ printf 'one 1\ntwo 2\nthree 3\n' | awk '$2="4"'
one 4
two 4
three 4
$ printf 'one 1\ntwo 2\nthree 3\n' | awk '$2=""'
$
Other questions on the data may be whether...
- the article number list may contain spaces
- the space after the colon is always existing
- blank lines may be existing in the file
- comment lines are possible in the file

These all will require a more "complex" awk pattern or action, yet
still simply solvable. Maybe something like

BEGIN { FS=":[[:space:]]*" }
!NF || /^[[:space:]]*#/ || $0=$1": 1-1"


Janis
Loading...