A feature I'd like to see in GAWK...

Discussion:

(too old to reply)

Kenny McCormack

2024-07-15 18:28:31 UTC

As we know, AWK in general, and GAWK in particular, has several different
ways of getting data into the program. In addition to the Automatic Input
Loop (the main feature of AWK), there are several variations of "getline".

"getline" can be used with files, or with processes (in 2 different ways!),
or even with network sockets. But the problem with getline is that using
it breaks the Automatic Input Loop. You can't use the standard
"pattern/action" paradigm if your input is coming in via "getline". Yes,
there are workarounds and yes we've all gotten used to it, but it is a
shame. For one thing, you can write your program as a shell script, and
use the shell to pipe in the data from a process. But this is ugly. And
not always sufficient.

Now, I have written a GAWK extension to handle this - called "pipeline".
Here is a sample script that uses "pipeline". Note that the Linux "df"
command has a "-l" option to show you only the local filesystems, but what
I usually want is the non-local ones - that's much more interesting. The
only way I can figure how to get that is to run "df" twice and compare the
output with and without "-l". Here is my program (non-local-df):

--- Cut Here ---
@load "pipeline"
@include "abort"
# Note: You can ignore the "abort" stuff. It is part of my ecosystem, but
# probably not part of yours.
BEGIN {
testAbort(ARGC > 1,"This program takes no args!!!",1)
pipeline("in","df -l")
while (ARGC < 3)
ARGV[ARGC++] = "-"
}
ENDFILE { if (ARGIND == 1) pipeline("in","df") }
ARGIND == 1 { x[$1]; next }
FNR == 1 || !($1 in x)
--- Cut Here ---

Needless to say, I'd like to see this sort of functionality built-in.

It seems to me that GAWK has been sort of fishing around lately looking for
new worlds to conquer. Some features have been added lately that seem (to
me anyway) sort of "out of place". namespaces, MPFR arithmetic (apparently,
now deprecated), persistent memory (nifty idea, though I don't really see
the practicality - and have not gotten around to testing it - i.e.,
compiling up a new enough version to try it).

I think something like the above would be more in line with the sort of
things I'd like to see in GAWK.

--
Adderall, pseudoephed, teleprompter

Mack The Knife

2024-07-16 14:29:10 UTC

Permalink

While this is interesting, it can actually be done very easily from the
shell level, using process substitution:

awk -f foo.awk <(df) <(df)

Post by Kenny McCormack
As we know, AWK in general, and GAWK in particular, has several different
ways of getting data into the program. In addition to the Automatic Input
Loop (the main feature of AWK), there are several variations of "getline".
"getline" can be used with files, or with processes (in 2 different ways!),
or even with network sockets. But the problem with getline is that using
it breaks the Automatic Input Loop. You can't use the standard
"pattern/action" paradigm if your input is coming in via "getline". Yes,
there are workarounds and yes we've all gotten used to it, but it is a
shame. For one thing, you can write your program as a shell script, and
use the shell to pipe in the data from a process. But this is ugly. And
not always sufficient.
Now, I have written a GAWK extension to handle this - called "pipeline".
Here is a sample script that uses "pipeline". Note that the Linux "df"
command has a "-l" option to show you only the local filesystems, but what
I usually want is the non-local ones - that's much more interesting. The
only way I can figure how to get that is to run "df" twice and compare the
--- Cut Here ---
@load "pipeline"
@include "abort"
# Note: You can ignore the "abort" stuff. It is part of my ecosystem, but
# probably not part of yours.
BEGIN {
testAbort(ARGC > 1,"This program takes no args!!!",1)
pipeline("in","df -l")
while (ARGC < 3)
ARGV[ARGC++] = "-"
}
ENDFILE { if (ARGIND == 1) pipeline("in","df") }
ARGIND == 1 { x[$1]; next }
FNR == 1 || !($1 in x)
--- Cut Here ---
Needless to say, I'd like to see this sort of functionality built-in.
It seems to me that GAWK has been sort of fishing around lately looking for
new worlds to conquer. Some features have been added lately that seem (to
me anyway) sort of "out of place". namespaces, MPFR arithmetic (apparently,
now deprecated), persistent memory (nifty idea, though I don't really see
the practicality - and have not gotten around to testing it - i.e.,
compiling up a new enough version to try it).
I think something like the above would be more in line with the sort of
things I'd like to see in GAWK.
--
Adderall, pseudoephed, teleprompter

Kenny McCormack

2024-07-16 16:25:28 UTC

Permalink

Post by Mack The Knife
While this is interesting, it can actually be done very easily from the
awk -f foo.awk <(df -l) <(df)

Which, as noted in the OP, is ugly and not AWK, but rather shell.
(As I said, we all know the workarounds - and we all know they are ugly)

And it doesn't work if you have to calculate the value of the process to
run inside the AWK script (which isn't the case with my "df" example, but
is why I used the phrase "not always sufficient").

--
After 4 years of disastrous screwups, Trump now favors 3 policies that I support:
1) $2K/pp stimulus money. Who doesn't want more money?
2) Water pressure. My shower doesn't work very well; I want Donnie to come fix it.
3) Repeal of Section 230. This will lead to the demise of Face/Twit/Gram. Yey!

Kaz Kylheku

2024-07-16 17:10:25 UTC

Permalink

Post by Kenny McCormack
--- Cut Here ---
@load "pipeline"
@include "abort"
# Note: You can ignore the "abort" stuff. It is part of my ecosystem, but
# probably not part of yours.
BEGIN {
testAbort(ARGC > 1,"This program takes no args!!!",1)
pipeline("in","df -l")
while (ARGC < 3)
ARGV[ARGC++] = "-"
}
ENDFILE { if (ARGIND == 1) pipeline("in","df") }
ARGIND == 1 { x[$1]; next }
FNR == 1 || !($1 in x)
--- Cut Here ---
Needless to say, I'd like to see this sort of functionality built-in.

TXR Lisp Awk macro:

1> (awk (:inputs (open-command "df -l")) (#/tmpfs/ (prn [f 5])))
/run
/dev/shm
/run/lock
/sys/fs/cgroup
/run/user/122
/run/user/500
nil

:inputs arguments can be files, lists of strings, input streams.

2> (awk (:inputs '("alpha beta" "gamma delta")) (t (prn [f 0])))
alpha
gamma
nil
3> (awk (:inputs "/etc/hostname") (t (prn [f 0])))
sun-go
nil

nil is the return value of the awk expression. You can control that.
The awk construct establishes a hidden block named awk around
your code.

E.g. return the first tmpfs path from "df -l":

4> (awk (:inputs (open-command "df -l"))
(#/tmpfs/ (return-from awk [f 5])))
"/run"

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @***@mstdn.ca

Arti F. Idiot

2024-07-16 20:05:56 UTC

Permalink

Post by Kenny McCormack
I think something like the above would be more in line with the sort of
things I'd like to see in GAWK.

+1 ; great idea.

Kenny McCormack

2024-07-17 12:23:00 UTC

Permalink

Post by Arti F. Idiot

Post by Kenny McCormack
I think something like the above would be more in line with the sort of
things I'd like to see in GAWK.

+1 ; great idea.

Well, I think so. The idea is that you shouldn't have to give up the most
intrinsic part of AWK (the pattern/action paradigm) just because your input
isn't a named (i.e., on the command line) file.

I think of it as "rehabilitating getline". Bringing it back into the fold,
rather than exiling it to the sidelines.

Note also that my "pipeline" extension only handles the case of a simple
process (either input or output - i.e., like AWK's "getline" and "print"
with "|" redirection). It doesn't handle any of the other variations of
getline/print - such as the ones that interface with network sockets. It
would be nice if a built-in approach did those things as well (and better
than my extension does).

--
The randomly chosen signature file that would have appeared here is more than 4
lines long. As such, it violates one or more Usenet RFCs. In order to remain
in compliance with said RFCs, the actual sig can be found at the following URL:
http://user.xmission.com/~gazelle/Sigs/FreeCollege

Jeremy Brubaker

2024-07-19 14:26:35 UTC

Permalink

That sounds quite useful. I am fairly certain I have wished a feature
like that existed and ended up just wrapping awk with sh but I agree
that's ugly.

Awk is underrated IMHO. Not that json/yaml/etc aren't useful things but
frequently when I seem them used my first thought is "If you had just
done well-formatted text records I could have parsed this with awk".

-- () www.asciiribbon.org | Jeremy Brubaker /\ - against html mail |
јЬruЬаkе@оrіоnаrtѕ.іо / neonrex on IRC

Even a hawk is an eagle among crows.