Discussion:
printing words without newlines?
(too old to reply)
David Chmelik
2024-05-12 04:57:16 UTC
Permalink
I'm learning more AWK basics and wrote function to read file, sort,
print. I use GNU AWK (gawk) and its sort but printing is harder to get
working than anything... separate lines work, but when I use printf() or
set ORS then use print (for words one line) all awk outputs (on FreeBSD
UNIX 14 and Slackware GNU/Linux 15) is a space (and not even newline
before shell prompt)... is this normal (and I made mistake?) or am I
approaching it wrong? I recall BASIC prints new lines, but as I learned
basic C and some derivatives, I'm used to newlines only being specified...
------------------------------------------------------------------------
# print_file_words.awk
# pass filename to function
BEGIN { print_file_words("data.txt"); }

# read two-column array from file and sort lines and print
function print_file_words(file) {
# set record separator then use print
# ORS=" "
while(getline<file) arr[$1]=$0
PROCINFO["sorted_in"]="@ind_num_asc"
for(i in arr)
{
split(arr[i],arr2)
# output all words or on one line with ORS
print arr2[2]
# output all words on one line without needing ORS
#printf("%s ",arr2[2])
}
}
------------------------------------------------------------------------
# sample data.txt
2 your
1 all
3 base
5 belong
4 are
7 us
6 to
Bruce Horrocks
2024-05-12 08:52:51 UTC
Permalink
Post by David Chmelik
I'm learning more AWK basics and wrote function to read file, sort,
print. I use GNU AWK (gawk) and its sort but printing is harder to get
working than anything... separate lines work, but when I use printf() or
set ORS then use print (for words one line) all awk outputs (on FreeBSD
UNIX 14 and Slackware GNU/Linux 15) is a space (and not even newline
before shell prompt)... is this normal (and I made mistake?) or am I
approaching it wrong? I recall BASIC prints new lines, but as I learned
basic C and some derivatives, I'm used to newlines only being specified...
------------------------------------------------------------------------
# print_file_words.awk
# pass filename to function
BEGIN { print_file_words("data.txt"); }
# read two-column array from file and sort lines and print
function print_file_words(file) {
# set record separator then use print
# ORS=" "
while(getline<file) arr[$1]=$0
for(i in arr)
{
split(arr[i],arr2)
# output all words or on one line with ORS
print arr2[2]
# output all words on one line without needing ORS
#printf("%s ",arr2[2])
}
}
------------------------------------------------------------------------
# sample data.txt
2 your
1 all
3 base
5 belong
4 are
7 us
6 to
You need to set ORS in the BEGIN { } section (or on the command line).

See
<https://www.gnu.org/software/gawk/manual/html_node/Output-Separators.html>
for an example - just replace the "\n\n" in the example with " " to see
the effect you are looking for.
--
Bruce Horrocks
Surrey, England
Bruce Horrocks
2024-05-12 08:55:52 UTC
Permalink
Post by Bruce Horrocks
Post by David Chmelik
I'm learning more AWK basics and wrote function to read file, sort,
print.  I use GNU AWK (gawk) and its sort but printing is harder to get
working than anything... separate lines work, but when I use printf() or
set ORS then use print (for words one line) all awk outputs (on FreeBSD
UNIX 14 and Slackware GNU/Linux 15) is a space (and not even newline
before shell prompt)... is this normal (and I made mistake?) or am I
approaching it wrong?  I recall BASIC prints new lines, but as I learned
basic C and some derivatives, I'm used to newlines only being
specified...
------------------------------------------------------------------------
# print_file_words.awk
# pass filename to function
BEGIN { print_file_words("data.txt"); }
# read two-column array from file and sort lines and print
function print_file_words(file) {
# set record separator then use print
# ORS=" "
   while(getline<file) arr[$1]=$0
   for(i in arr)
   {
     split(arr[i],arr2)
     # output all words or on one line with ORS
     print arr2[2]
     # output all words on one line without needing ORS
     #printf("%s ",arr2[2])
   }
}
------------------------------------------------------------------------
# sample data.txt
2 your
1 all
3 base
5 belong
4 are
7 us
6 to
You need to set ORS in the BEGIN { } section (or on the command line).
See
<https://www.gnu.org/software/gawk/manual/html_node/Output-Separators.html> for an example - just replace the "\n\n" in the example with " " to see the effect you are looking for.
Let me re-phrase that: it would be better to set ORS in the BEGIN {}
section. I'm not sure why yours is not working but with some commented
out code and some not, your example is unclear.

If what I have suggested doesn't work for you then please re-post your
exact code.
--
Bruce Horrocks
Surrey, England
Kenny McCormack
2024-05-12 12:11:27 UTC
Permalink
In article <e0be0c38-e14e-45ba-ac87-***@scorecrow.com>,
Bruce Horrocks <***@scorecrow.com> wrote:
...
Post by Bruce Horrocks
You need to set ORS in the BEGIN { } section (or on the command line).
This is demonstrably false. You can set ORS whenever/wherever you want.
Whatever value it has when a plain "print" statement is executed, is what
will be used. You are probably about thinking about the various variables
that affect input parsing. These variables clearly must be set prior to the
reading of the input, which usually means they need to be set in BEGIN (or
via something like -F or -v on the command line).

One of my favorite idioms (and one that might actually be useful to OP) is:

# Print every 3 input lines as a single output line
# Yes, this single line is the whole program!
ORS = NR % 3 ? " " : "\n"
Post by Bruce Horrocks
See
<https://www.gnu.org/software/gawk/manual/html_node/Output-Separators.html>
for an example - just replace the "\n\n" in the example with " " to see
the effect you are looking for.
Of course, the whole point of this thread is that none of us has any idea
what OP is talking about or what his actual problem is. We can only guess...
--
"It does a lot of things half well and it's just a garbage heap of ideas that are
mutually exclusive."

- Ken Thompson, on C++ -
David Chmelik
2024-05-13 02:04:50 UTC
Permalink
Post by Kenny McCormack
Of course, the whole point of this thread is that none of us has any
idea what OP is talking about or what his actual problem is. We can
only guess...
Not the point. I stated I'm trying AWK... problem is in subject line.
Surprisingly, after rebooting PC, it all works now (un)commenting
particular parts (OSR or commenting out print and uncommenting printf).
Post by Kenny McCormack
Let me re-phrase that: it would be better to set ORS in the BEGIN {}
section. I'm not sure why yours is not working but with some commented
out code and some not, your example is unclear.
Okay. What I posted works to read file, sort, print lines; I commented
out two versions that (initially) didn't work to print all on one line
(OSR or commenting out print and uncommenting printf). After rebooting
(maybe just needed to restart shell?) those worked as expected... with ORS
in BEGIN but alternatively in function I wrote. I guess as Mr McCormack
explained, one might have reasons to change OSR in different functions.
Kaz Kylheku
2024-05-13 16:49:59 UTC
Permalink
Post by Kenny McCormack
...
Post by Bruce Horrocks
You need to set ORS in the BEGIN { } section (or on the command line).
This is demonstrably false. You can set ORS whenever/wherever you want.
Whatever value it has when a plain "print" statement is executed, is what
will be used. You are probably about thinking about the various variables
that affect input parsing. These variables clearly must be set prior to the
reading of the input, which usually means they need to be set in BEGIN (or
via something like -F or -v on the command line).
# Print every 3 input lines as a single output line
# Yes, this single line is the whole program!
ORS = NR % 3 ? " " : "\n"
Post by Bruce Horrocks
See
<https://www.gnu.org/software/gawk/manual/html_node/Output-Separators.html>
for an example - just replace the "\n\n" in the example with " " to see
the effect you are looking for.
Of course, the whole point of this thread is that none of us has any idea
what OP is talking about or what his actual problem is. We can only guess...
The problem seems to be that there is a file of words preceded by
unique integer ranks which indicate the order. They are to be reproduced
in rank order, on one line.

s is the TXR Lisp interactive listener of TXR 294.
Quit with :quit or Ctrl-D on an empty line. Ctrl-X ? for cheatsheet.
Self-assembly keeps TXR costs low; but ask about our installation service!
1> (flow "data.txt"
file-get-lines
(mapcar (do match `@a @b` @1 (vec (pred (toint a)) b)))
transpose
(select (second @1) (first @1))
(join-with " ")
put-line)
all your base are belong to us

We can insert prints into the pipeline to see the transformations:

2> (flow "data.txt"
prinl
file-get-lines
prinl
(mapcar (do match `@a @b` @1 (vec (pred (toint a)) b)))
prinl
transpose
prinl
(select (second @1) (first @1))
prinl
(join-with " ")
prinl
put-line)
"data.txt"
("2 your" "1 all" "3 base" "5 belong" "4 are" "7 us" "6 to")
(#(1 "your") #(0 "all") #(2 "base") #(4 "belong") #(3 "are") #(6 "us")
#(5 "to"))
#(#(1 0 2 4 3 6 5) #("your" "all" "base" "belong" "are" "us" "to"))
#("all" "your" "base" "are" "belong" "to" "us")
"all your base are belong to us"
all your base are belong to us
t

That is tedious; say, why not make a macro dflow (debug flow) which inserts
those prinl's for us?

3> (defmacro dflow (. args)
^(flow ,*(interpose 'prinl args)))
dflow

Sanity check: is it inserting prinls?

4> (macroexpand-1 '(dflow a b c d))
(flow a prinl
b prinl c prinl
d)

Use dflow:

5> (dflow "data.txt"
file-get-lines
(mapcar (do match `@a @b` @1 (vec (pred (toint a)) b)))
transpose
(select (second @1) (first @1))
(join-with " ")
put-line)
"data.txt"
("2 your" "1 all" "3 base" "5 belong" "4 are" "7 us" "6 to")
(#(1 "your") #(0 "all") #(2 "base") #(4 "belong") #(3 "are") #(6 "us")
#(5 "to"))
#(#(1 0 2 4 3 6 5) #("your" "all" "base" "belong" "are" "us" "to"))
#("all" "your" "base" "are" "belong" "to" "us")
"all your base are belong to us"
all your base are belong to us
t

After file-get-lines we have a list of strings like "2 your".

We map those through an anonymous function which matches the
string pattern `@a @b` to capture the space-separated text pieces.
A is converted to integer and mapped to its predecessor
(because we want to use it as an index, and indexing is zero based).
We map each string to a two element vector consisting of the
zero-based index as an integer type, and a string, so now we have:

(#(1 "your") #(0 "all") ...)

#(a b c) is a vector notation.

Then we want to transpose rows to columns to get the integer
column as a vector, and the values as a vector.

#(#(1 0 2 4 3 6 5) #("your" "all" "base" "belong" "are" "us" "to"))

Now we use the built-in function select which selects elements out
of a sequence, based on indices supplied in another sequence.

Now we have the vector of words in the right order; we just
join with a space.
j***@addr.invalid
2024-05-12 18:22:05 UTC
Permalink
<snip>
I'm learning more AWK basics and wrote function to read file, sort,
print. I use GNU AWK (gawk) and its sort but printing is harder to get
working than anything... separate lines work, but when I use printf() or
set ORS then use print (for words one line) all awk outputs (on FreeBSD
UNIX 14 and Slackware GNU/Linux 15) is a space (and not even newline
before shell prompt)... is this normal (and I made mistake?) or am I
approaching it wrong? I recall BASIC prints new lines, but as I learned
basic C and some derivatives, I'm used to newlines only being specified...
------------------------------------------------------------------------
# print_file_words.awk
# pass filename to function
BEGIN { print_file_words("data.txt"); }
# read two-column array from file and sort lines and print
function print_file_words(file) {
# set record separator then use print
# ORS=" "
while(getline<file) arr[$1]=$0
for(i in arr)
{
split(arr[i],arr2)
# output all words or on one line with ORS
print arr2[2]
# output all words on one line without needing ORS
#printf("%s ",arr2[2])
}
}
<snip>
I think you forgot that arr2 is now an array => you have to iterate over
it as well. There were also a few other coding errors, ie. not closing
the data.txt file; not declaring local vars in print_file_words:

--
$ cat test.awk
BEGIN { print_file_words("data.txt") }

function print_file_words(file, i,j) {
ORS = " "
PROCINFO["sorted_in"]="@ind_num_asc"
while (getline <file >0)
arr[$1] = $0
close (file)

for(i in arr) {
split(arr[i],arr2)
for (j in arr2)
print arr2[j]
}
ORS = "\n"
print ""
}

$ gawk -f test.awk
all are base belong to us your
--

Probably this is not the best way of doing things but I think you're
mainly just experimenting with sorting/printing so..
David Chmelik
2024-05-13 01:09:28 UTC
Permalink
Post by j***@addr.invalid
<snip>
I'm learning more AWK basics and wrote function to read file, sort,
print. I use GNU AWK (gawk) and its sort but printing is harder to get
working than anything... separate lines work, but when I use printf() or
set ORS then use print (for words one line) all awk outputs (on FreeBSD
UNIX 14 and Slackware GNU/Linux 15) is a space (and not even newline
before shell prompt)... is this normal (and I made mistake?) or am I
approaching it wrong? I recall BASIC prints new lines, but as I learned
basic C and some derivatives, I'm used to newlines only being
specified...
------------------------------------------------------------------------
# print_file_words.awk # pass filename to function BEGIN {
print_file_words("data.txt"); }
# read two-column array from file and sort lines and print function
print_file_words(file) {
# set record separator then use print # ORS=" "
for(i in arr)
{
split(arr[i],arr2)
# output all words or on one line with ORS print arr2[2]
# output all words on one line without needing ORS #printf("%s
",arr2[2])
}
}
<snip>
I think you forgot that arr2 is now an array => you have to iterate over
it as well. There were also a few other coding errors, ie. not closing
The split() sets arr[2] equal to arr[i] current word (second column) so
the for() already iterates to update arr2 (it only ever is a two-element
array with a number (not printed) then word) and prints each word fine on
new lines when not trying to print them on one line. The only problem is
something went wrong with printf or ORS & print.
David Chmelik
2024-05-13 02:13:28 UTC
Permalink
Post by j***@addr.invalid
<snip>
I'm learning more AWK basics and wrote function to read file, sort,
print. I use GNU AWK (gawk) and its sort but printing is harder to get
working than anything... separate lines work, but when I use printf() or
set ORS then use print (for words one line) all awk outputs (on FreeBSD
UNIX 14 and Slackware GNU/Linux 15) is a space (and not even newline
before shell prompt)... is this normal (and I made mistake?) or am I
approaching it wrong? I recall BASIC prints new lines, but as I learned
basic C and some derivatives, I'm used to newlines only being
specified...
------------------------------------------------------------------------
# print_file_words.awk # pass filename to function BEGIN {
print_file_words("data.txt"); }
# read two-column array from file and sort lines and print function
print_file_words(file) {
# set record separator then use print # ORS=" "
for(i in arr)
{
split(arr[i],arr2)
# output all words or on one line with ORS print arr2[2]
# output all words on one line without needing ORS #printf("%s
",arr2[2])
}
}
<snip>
I think you forgot that arr2 is now an array => you have to iterate over
it as well. There were also a few other coding errors, ie. not closing
--
$ cat test.awk BEGIN { print_file_words("data.txt") }
function print_file_words(file, i,j) {
while (getline <file >0)
arr[$1] = $0
close (file)
for(i in arr) {
split(arr[i],arr2)
for (j in arr2)
print arr2[j]
}
ORS = "\n"
print ""
}
$ gawk -f test.awk all are base belong to us your
My original works after rebooting after discussion in main thread (without
'Re') but thanks for instruction to close file, though I don't know you
need to pass in i--not used outside. It's odd iterating over arr2 even
still prints all words (wrong order) because the way I used arr2 it only
ever had one number and one word--its point was to split out & get word,
then for the next i, it's split again onto arr2 which is erased/updated.
j***@addr.invalid
2024-05-13 04:50:39 UTC
Permalink
Post by David Chmelik
<snip>
My original works after rebooting after discussion in main thread (without
'Re') but thanks for instruction to close file, though I don't know you
need to pass in i--not used outside. It's odd iterating over arr2 even
still prints all words (wrong order) because the way I used arr2 it only
ever had one number and one word--its point was to split out & get word,
then for the next i, it's split again onto arr2 which is erased/updated.
You're right that in your particular data case --one word per line--
arr2 is always of length 1 => you could use arr2[1]. But creating
the arr2 array via split() isn't even necessary since arr will print
out in the order specified in PROCINFO["sorted_in"]:
--
$ cat test.awk
BEGIN { print_file_words("data.txt") }

function print_file_words(file, i) {
ORS = " "
PROCINFO["sorted_in"]="@ind_num_asc"
while (getline <file >0)
arr[$1] = $0
close (file)

for(i in arr)
print arr[i]
ORS = "\n"
print ""
}

$ gawk -f test.awk data.txt
all are base belong to us your
-

WRT close() you should do it whenever you're finish reading from a
file OR command. WRT user-defined functions, variables intended to be
local to the function should be declared otherwise they become global
variables; try removing the "i" from the function print_file_words()
definition and tacking on the following to your code:

END { print "i =", i }

which will print "i = your" as the last line of output.

Have fun,
-j
Kenny McCormack
2024-05-13 06:56:50 UTC
Permalink
In article <v1pi7c$2b87j$***@dont-email.me>,
David Chmelik <***@gmail.com> wrote:
...
Post by David Chmelik
# print_file_words.awk
# pass filename to function
BEGIN { print_file_words("data.txt"); }
# read two-column array from file and sort lines and print
function print_file_words(file) {
# set record separator then use print
# ORS=" "
while(getline<file) arr[$1]=$0
for(i in arr)
{
split(arr[i],arr2)
# output all words or on one line with ORS
print arr2[2]
# output all words on one line without needing ORS
#printf("%s ",arr2[2])
}
}
------------------------------------------------------------------------
# sample data.txt
2 your
1 all
3 base
5 belong
4 are
7 us
6 to
I guess this is what you actually want:

{ A[$1] = $2 }
END {
len = length(A)
for (i=1; i<=len; i++)
printf("%s%s",A[i],i<len ? " " : "\n")
}
--
The randomly chosen signature file that would have appeared here is more than 4
lines long. As such, it violates one or more Usenet RFCs. In order to remain
in compliance with said RFCs, the actual sig can be found at the following URL:
http://user.xmission.com/~gazelle/Sigs/Noam
Kenny McCormack
2024-05-13 14:53:38 UTC
Permalink
In article <v1sdji$tofu$***@news.xmission.com>,
Kenny McCormack <***@shell.xmission.com> wrote:
...
Post by Kenny McCormack
{ A[$1] = $2 }
END {
len = length(A)
for (i=1; i<=len; i++)
printf("%s%s",A[i],i<len ? " " : "\n")
}
Improved version:

{ A[$1] = $2 }
END {
for (i=1; i<=NR; i++)
printf("%s%s",A[i],i<NR ? " " : "\n")
}

Note that the value of NR in END is sort of a gray area, but it works as
expected in GAWK, which is really all we care about.
--
[Donald] Trump didn't have it all handed to him by his parents,
like Hillary Clinton did.

- Some dumb cluck in Ohio; featured in Michael Moore's "Trumpland" -
Janis Papanagnou
2024-05-13 08:18:40 UTC
Permalink
Post by David Chmelik
I'm learning more AWK basics and wrote function to read file, sort,
print. I use GNU AWK (gawk) and its sort but printing is harder to get
working than anything... separate lines work, but when I use printf() or
set ORS then use print (for words one line) all awk outputs (on FreeBSD
UNIX 14 and Slackware GNU/Linux 15) is a space (and not even newline
before shell prompt)... is this normal (and I made mistake?) or am I
approaching it wrong? I recall BASIC prints new lines, but as I learned
basic C and some derivatives, I'm used to newlines only being specified...
IIUC you meanwhile have your script running, and probably code similar
to

BEGIN { print_file_words("data.txt"); }

function print_file_words(file) {
while (getline <file >0)
arr[$1] = $0
PROCINFO["sorted_in"] = "@ind_num_asc"
for (i in arr) {
split (arr[i], arr2)
printf "%s ", arr2[2]
}
printf "\n"
}

I suggest to add the '>0' test to your code, and also print a final
"\n" so that your command line prompt doesn't overwrite your output.
Note also that printf (like print) is a command, no function. Adding
local variable declarations is also sensible to not get problems if
you operate your code in other source code contexts.

Janis
Post by David Chmelik
------------------------------------------------------------------------
# print_file_words.awk
# pass filename to function
BEGIN { print_file_words("data.txt"); }
# read two-column array from file and sort lines and print
function print_file_words(file) {
# set record separator then use print
# ORS=" "
while(getline<file) arr[$1]=$0
for(i in arr)
{
split(arr[i],arr2)
# output all words or on one line with ORS
print arr2[2]
# output all words on one line without needing ORS
#printf("%s ",arr2[2])
}
}
------------------------------------------------------------------------
# sample data.txt
2 your
1 all
3 base
5 belong
4 are
7 us
6 to
Kaz Kylheku
2024-05-13 17:17:05 UTC
Permalink
Post by David Chmelik
# sample data.txt
2 your
1 all
3 base
5 belong
4 are
7 us
6 to
$ awk '{
if ($1 > max) max = $1;
rank[$1] = $2
}

END {
for (i = 1; i <= max; i++)
if (i in rank) {
printf("%s%s", sep, rank[i]);
sep = " "
}
print ""
}' data.txt
all your base are belong to us

We do not perform any sort, and so we don't require GNU extensions. Sorting is
silly, because data is already sorted: we are given the positional rank of
every word, which is a way of capturing order. All we have to do is visit the
words in that order.

We can do that by iterating an index i from 1 to the highest index
we have seen. If there is a rank[i] entry, then we print it.
(We do this "(i in rank)" check in case there are gaps in the rank
sequence.)

After we print one word, we start using the " " separator before all
subsequent words.

If we must sort, there is the sort utility:

$ sort -n data.txt | awk '{ printf("%s%s", sep, $2); sep = " " }' && echo
all your base are belong to us

Also, if we can suffer a spurious trailing space:

$ sort -n data.txt | awk '{ print $2 }' | tr '\n' ' ' && echo
all your base are belong to us
Kenny McCormack
2024-05-13 17:26:56 UTC
Permalink
In article <***@kylheku.com>,
Kaz Kylheku <643-408-***@kylheku.com> wrote:
...
(This version more complicated than it needs to be, but essentially the
same as what I posted earlier)
Post by Kaz Kylheku
$ awk '{
if ($1 > max) max = $1;
rank[$1] = $2
}
END {
for (i = 1; i <= max; i++)
if (i in rank) {
printf("%s%s", sep, rank[i]);
sep = " "
}
print ""
}' data.txt
all your base are belong to us
We do not perform any sort, and so we don't require GNU extensions. Sorting is
But GNU extensions are good - especially since OP specifically mentioned
using GAWK. And much more on-topic than Lisp (et al).

Final note: In fact, it has been established (on this newsgroup as well as
empirically by me and others) that if the indices are small integers, you
get sorting for free (in GAWK, which, as noted, is all we care about). So,
you don't even really need to mess with PROCINFO[]...

And, one more note about sorting. Some responders on this thread have
gotten confused about what is to be sorted. They assumed that OP wanted
the words sorted (alphabetically), when, in fact, he just wants them sorted
(numerically) by the position number (the first field in the data line).
--
The randomly chosen signature file that would have appeared here is more than 4
lines long. As such, it violates one or more Usenet RFCs. In order to remain
in compliance with said RFCs, the actual sig can be found at the following URL:
http://user.xmission.com/~gazelle/Sigs/Mandela
Kaz Kylheku
2024-05-13 23:33:07 UTC
Permalink
Post by Kenny McCormack
...
(This version more complicated than it needs to be, but essentially the
same as what I posted earlier)
Post by Kaz Kylheku
$ awk '{
if ($1 > max) max = $1;
rank[$1] = $2
}
END {
for (i = 1; i <= max; i++)
if (i in rank) {
printf("%s%s", sep, rank[i]);
sep = " "
}
print ""
}' data.txt
all your base are belong to us
We do not perform any sort, and so we don't require GNU extensions. Sorting is
But GNU extensions are good - especially since OP specifically mentioned
using GAWK. And much more on-topic than Lisp (et al).
The above performs O(N) steps, whereas sorting is O(N log N),
and sometimes worse due to degenerate cases in some algorithms.

Why use an extension that only makes the program more verbose and brings
in an unnecessary algorithm.
Post by Kenny McCormack
Final note: In fact, it has been established (on this newsgroup as well as
empirically by me and others) that if the indices are small integers, you
get sorting for free (in GAWK, which, as noted, is all we care about). So,
you don't even really need to mess with PROCINFO[]...
Are you referring to the idea of just replacing the above for + if
structure with:

for (i in rank) {

}

and relying on the small integer indices being hashed in order?

Where is that documented? The manual reiterates that this is not
specified: "By default, the order in which a ‘for (indx in array)’ loop
scans an array is not defined; it is generally based upon the internal
implementation of arrays inside awk."
--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @***@mstdn.ca
Kenny McCormack
2024-05-14 13:40:21 UTC
Permalink
In article <***@kylheku.com>,
Kaz Kylheku <643-408-***@kylheku.com> wrote:
...
Post by Kaz Kylheku
Post by Kenny McCormack
Final note: In fact, it has been established (on this newsgroup as well as
empirically by me and others) that if the indices are small integers, you
get sorting for free (in GAWK, which, as noted, is all we care about). So,
you don't even really need to mess with PROCINFO[]...
Are you referring to the idea of just replacing the above for + if
for (i in rank) {
}
and relying on the small integer indices being hashed in order?
Yes.
Post by Kaz Kylheku
Where is that documented? The manual reiterates that this is not
specified: "By default, the order in which a for (indx in array) loop
scans an array is not defined; it is generally based upon the internal
implementation of arrays inside awk."
It is documented in this newsgroup (Google is your friend).
And assented to by one or both of the GAWK insiders who are known to post here.
It seems to be an attribute (i.e., quirk) of the particular hashing
algorithm used.

Now, of course it isn't guaranteed and could disappear in some future
version of GAWK - and, of course, one wouldn't rely on it in production
code, since it is so easy to make it right by including the line (shown in
this thread's OP) that sets PROCINFO[].

But it is true, nonetheless.
--
The key difference between faith and science is that in science, evidence that
doesn't fit the theory tends to weaken the theory (that is, make it less likely to
be believed), whereas in faith, contrary evidence just makes faith stronger (on
the assumption that Satan is testing you - trying to make you abandon your faith).
Ed Morton
2024-05-16 13:11:35 UTC
Permalink
Post by David Chmelik
I'm learning more AWK basics and wrote function to read file, sort,
print. I use GNU AWK (gawk) and its sort but printing is harder to get
working than anything... separate lines work, but when I use printf() or
set ORS then use print (for words one line) all awk outputs (on FreeBSD
UNIX 14 and Slackware GNU/Linux 15) is a space (and not even newline
before shell prompt)...
Your input file probably has DOS line endings, see
https://stackoverflow.com/questions/45772525/why-does-my-tool-output-overwrite-itself-and-how-do-i-fix-it
for what that means and how to deal with them but basically either run
`dos2unix` on your file before calling awk or add `sub(\r$/,"")` as I
show below*.

is this normal (and I made mistake?) or am I
Post by David Chmelik
approaching it wrong? I recall BASIC prints new lines, but as I learned
basic C and some derivatives, I'm used to newlines only being specified...
------------------------------------------------------------------------
# print_file_words.awk
# pass filename to function
BEGIN { print_file_words("data.txt"); }
# read two-column array from file and sort lines and print
function print_file_words(file) {
# set record separator then use print
# ORS=" "
Move the above to a BEGIN section so it is executed once total instead
of once per input line.
Post by David Chmelik
while(getline<file) arr[$1]=$0
The above would spin off into an infinite loop if getline failed since
in that case it'd return a negative number which would still evaluate to
"true" when tested as a condition. It needs to be:

while ( (getline < file) > 0 ) arr[$1] = $0

See http://awk.freeshell.org/AllAboutGetline for that and more info on
using getline.

*This is where you'd strip CRs from the end of input lines. Do either of
these, the first uses a non-POSIX extension function gensub() (which
gawk has), the second would work in any awk:

a) while ( (getline < file) > 0 ) arr[$1] = gensub(/\r$/,"",1)

b) while ( (getline < file) > 0 ) { sub(/\r$/,""); arr[$1] = $0 }
Post by David Chmelik
for(i in arr)
{
split(arr[i],arr2)
# output all words or on one line with ORS
print arr2[2]
# output all words on one line without needing ORS
#printf("%s ",arr2[2])
}
Add `print RS` after the loop if you had set ORS to a blank so the
output ends in a newline and therefore is a valid POSIX text file,
otherwise YMMV with what subsequent text processing tools can do with it.

Ed.
Post by David Chmelik
}
------------------------------------------------------------------------
# sample data.txt
2 your
1 all
3 base
5 belong
4 are
7 us
6 to
Janis Papanagnou
2024-05-16 13:55:35 UTC
Permalink
Post by Ed Morton
Post by David Chmelik
I'm learning more AWK basics and wrote function to read file, sort,
print. I use GNU AWK (gawk) and its sort but printing is harder to get
working than anything... separate lines work, but when I use printf() or
set ORS then use print (for words one line) all awk outputs (on FreeBSD
UNIX 14 and Slackware GNU/Linux 15) is a space (and not even newline
before shell prompt)...
[...]
Post by David Chmelik
------------------------------------------------------------------------
# print_file_words.awk
# pass filename to function
BEGIN { print_file_words("data.txt"); }
# read two-column array from file and sort lines and print
function print_file_words(file) {
# set record separator then use print
# ORS=" "
Move the above to a BEGIN section so it is executed once total instead
of once per input line.
A function definition called once from the BEGIN section isn't
called "once per input line".

Janis
Post by Ed Morton
Post by David Chmelik
[...]
Kenny McCormack
2024-05-16 14:15:59 UTC
Permalink
In article <v2538p$1jmvm$***@dont-email.me>,
Janis Papanagnou <janis_papanagnou+***@hotmail.com> wrote:
...
Post by Janis Papanagnou
A function definition called once from the BEGIN section isn't
called "once per input line".
Especially since it is commented out, so it executes exactly zero times.

Actually setting ORS (or any other similar variable) inside a function
definition is not such a bad idea, in terms of modularity.
--
To all the people worried about how bad it would look to have a public trial of a
former president (and all the usual verbiage that we heard in 1974), I say this to DJT:
Just plead guilty, take your medicine, do your time, just fade away.
For the good of the country. Do the right thing.
Kenny McCormack
2024-05-16 15:17:42 UTC
Permalink
Post by Kenny McCormack
...
Post by Janis Papanagnou
A function definition called once from the BEGIN section isn't
called "once per input line".
Especially since it is commented out, so it executes exactly zero times.
Actually setting ORS (or any other similar variable) inside a function
definition is not such a bad idea, in terms of modularity.
In fact, I'd like to expand on that. It is commonly held that a
well-written function that changes the values of "special variables" should
save and restore them. I.e.:

function foo(arg1, arg2, ...) {
oldORS = ORS
ORS = new value
...
ORS = oldORS
}

But in fact, in practice, this can get tricky - due to vagaries of the AWK
language. What would really be nice is if you could declare special
variables in the parameter list - which would give them the "local
variable" treatment. I.e.:

function foo(arg1, arg2, ..., ORS) {
ORS = new value
...
}

Now, ORS would be magically restored to its previous value w/o the function
having to deal with it (**). Unfortunately, neither GAWK nor TAWK allows this.
GAWK gives an error message saying you can't use special variables in arg
lists. TAWK just silently ignores the attempt.

What would be even better is if this happened magically w/o needing to do
the above parameter trick. An argument can be made that changes to special
variables should, by default, be local to functions. Now, as it happens,
this would break one of my functions - which I call "setsort", which sets
PROCINFO["sorted_in"] for me. Basically, I can never remember the special
names of the internal sorting functions (e.g., @ind_whatever), so I wrote a
function setsort() and can now just do: setsort(1) to get the most commonly
used sorting functionality. I find it easier to remember the numbers than
to remember the exact spelling of those names.

This, in turn, could be fixed if there was a "global" statement that would
make a selected variable global rather than local (*). This is, in part,
inspired by Tcl syntax, where everything is local by default and you have
to explicitly use "global var" to make "var" global. I've often thought
that, if it could be done all over again, AWK might be better if it had
followed the Tcl model for function variables. Of course, it can't be
changed now.

(*) So, my setsort() function, I would write: global PROCINFO
and that would make changes to PROCINFO visible to the caller.

(**) Or, you could even pass a value for ORS in as part of the function call.
--
The randomly chosen signature file that would have appeared here is more than 4
lines long. As such, it violates one or more Usenet RFCs. In order to remain
in compliance with said RFCs, the actual sig can be found at the following URL:
http://user.xmission.com/~gazelle/Sigs/PennJillette
Ed Morton
2024-05-17 00:40:18 UTC
Permalink
Post by Janis Papanagnou
Post by Ed Morton
Post by David Chmelik
I'm learning more AWK basics and wrote function to read file, sort,
print. I use GNU AWK (gawk) and its sort but printing is harder to get
working than anything... separate lines work, but when I use printf() or
set ORS then use print (for words one line) all awk outputs (on FreeBSD
UNIX 14 and Slackware GNU/Linux 15) is a space (and not even newline
before shell prompt)...
[...]
Post by David Chmelik
------------------------------------------------------------------------
# print_file_words.awk
# pass filename to function
BEGIN { print_file_words("data.txt"); }
# read two-column array from file and sort lines and print
function print_file_words(file) {
# set record separator then use print
# ORS=" "
Move the above to a BEGIN section so it is executed once total instead
of once per input line.
A function definition called once from the BEGIN section isn't
called "once per input line".
I didn't notice the function keyword nestled in the preceding comments
and didn't give it much thought, thanks for pointing that out.

Ed.

Loading...