Discussion:
Nth (Ordinal Numeral Suffix)
(too old to reply)
Mike Sanders
2023-11-03 19:30:50 UTC
Permalink
# tags: nth, ordinal, suffix, digit, numbers, awk, code
#
# appends ordinal suffix to space delimited numerals
# Michael Sanders 2023
# https://busybox.neocities.org/notes/nth.txt
#
# usage example: echo 101 42 23 98 foo | awk -f nth.txt
#
# output (1 per line): 101st 42nd 23rd 98th foo
#
# further reading:
# https://en.wikipedia.org/wiki/Ordinal_numeral

function nth(day) {
if (day ~ /^[0-9]+$/) {
if (day ~ /^1[1-3]$/ || day > 20) {
if (day % 10 == 1) return day "st"
if (day % 10 == 2) return day "nd"
if (day % 10 == 3) return day "rd"
}
return day "th"
}
return day
}

{
delete v
split($0, v)
for (x in v) print nth(v[x])
}

# eof
--
:wq
Mike Sanders
Mike Sanders
2023-11-03 19:57:25 UTC
Permalink
Post by Mike Sanders
function nth(day) {
if (day ~ /^[0-9]+$/) {
if (day ~ /^1[1-3]$/ || day > 20) {
if (day % 10 == 1) return day "st"
if (day % 10 == 2) return day "nd"
if (day % 10 == 3) return day "rd"
}
return day "th"
}
return day
}
On 2nd thought, I think this could be better rendered as:

# tags: nth, ordinal, suffix, digit, numbers, awk, code
#
# appends ordinal suffix to space delimited numerals
# Michael Sanders 2023
# https://busybox.neocities.org/notes/nth.txt
#
# usage example: echo 101 42 23 98 foo | awk -f nth.txt
#
# output (1 per line): 101st 42nd 23rd 98th foo
#
# further reading:
# https://en.wikipedia.org/wiki/Ordinal_numeral

function nth(num) {
if (num ~ /^[0-9]+$/) {
if (num % 100 != 11 && num % 100 != 12 && num % 100 != 13) {
if (num % 10 == 1) return num "st"
if (num % 10 == 2) return num "nd"
if (num % 10 == 3) return num "rd"
}
return num "th"
}
return num
}

{
delete v
split($0, v)
for (x in v) print nth(v[x])
}

# eof
--
:wq
Mike Sanders
Ben Bacarisse
2023-11-03 20:49:53 UTC
Permalink
Post by Mike Sanders
Post by Mike Sanders
function nth(day) {
if (day ~ /^[0-9]+$/) {
if (day ~ /^1[1-3]$/ || day > 20) {
if (day % 10 == 1) return day "st"
if (day % 10 == 2) return day "nd"
if (day % 10 == 3) return day "rd"
}
return day "th"
}
return day
}
That's not really what "better rendered" means. The two bits of code
are functionally very different.
Post by Mike Sanders
# tags: nth, ordinal, suffix, digit, numbers, awk, code
#
# appends ordinal suffix to space delimited numerals
# Michael Sanders 2023
# https://busybox.neocities.org/notes/nth.txt
#
# usage example: echo 101 42 23 98 foo | awk -f nth.txt
#
# output (1 per line): 101st 42nd 23rd 98th foo
#
# https://en.wikipedia.org/wiki/Ordinal_numeral
function nth(num) {
if (num ~ /^[0-9]+$/) {
if (num % 100 != 11 && num % 100 != 12 && num % 100 != 13) {
if (num % 10 == 1) return num "st"
if (num % 10 == 2) return num "nd"
if (num % 10 == 3) return num "rd"
}
return num "th"
}
return num
}
{
delete v
split($0, v)
for (x in v) print nth(v[x])
This is a little odd in that the output order will not necessarily match
the input order. Whilst I understand that this is probably just driver
code to test the function, it's going to make automatic testing harder.

Especially as (as you probably know) you can scan the fields in a line,
in order, like this

for (i = 1; i <= NF; i++) print nth($i)
Post by Mike Sanders
}
# eof
--
Ben.
Mike Sanders
2023-11-03 21:47:46 UTC
Permalink
Ben Bacarisse <***@bsb.me.uk> wrote:

Hey Ben =)
Post by Ben Bacarisse
That's not really what "better rendered" means. The two bits of code
are functionally very different.
Oh c'mon now you're being fussy on this point & besides for you or me?
The distinction is important because you're speaking for yourself
& using that same logic, since I wrote the snippet, I can define my
own grammar no? Anyone can plainly read the 1st & 2nd versions of the
script & discern the differences. But 'quibble not'.
Post by Ben Bacarisse
This is a little odd in that the output order will not necessarily match
the input order. Whilst I understand that this is probably just driver
code to test the function, it's going to make automatic testing harder.
Nothing odd about it, I believe several implementations awk using:

'for (x in array)...'

say the output in not guaranteed to be in sequential order BUT...

Aye - I'll concede this point kind sir & update the script accordingly as
it is more inline with what the user would expect (& less code to boot).

So script updated as per your suggestion:

https://busybox.neocities.org/notes/nth.txt

Good catch Ben & thank you.
--
:wq
Mike Sanders
Ben Bacarisse
2023-11-03 22:06:54 UTC
Permalink
Post by Mike Sanders
Hey Ben =)
Post by Ben Bacarisse
That's not really what "better rendered" means. The two bits of code
are functionally very different.
Oh c'mon now you're being fussy on this point & besides for you or me?
This is a very short function, so maybe a reader will see that the two
do different things, but in general I would not necessarily take a new
copy if someone posted a "better rendering" of some code. I would
expect at most superficial, aesthetic changes.

I don't want to assume you are a native speaker of English, so it's
possible that you don't know how minor a change "a better rendering" of
something is likely to be.

And I don't know what you mean by "& besides for you or me?".
Post by Mike Sanders
The distinction is important because you're speaking for yourself
& using that same logic, since I wrote the snippet, I can define my
own grammar no? Anyone can plainly read the 1st & 2nd versions of the
script & discern the differences. But 'quibble not'.
I don't follow this.
Post by Mike Sanders
Post by Ben Bacarisse
This is a little odd in that the output order will not necessarily match
the input order. Whilst I understand that this is probably just driver
code to test the function, it's going to make automatic testing harder.
'for (x in array)...'
say the output in not guaranteed to be in sequential order BUT...
Aye - I'll concede this point kind sir & update the script accordingly as
it is more inline with what the user would expect (& less code to boot).
https://busybox.neocities.org/notes/nth.txt
Good catch Ben & thank you.
You're welcome.
--
Ben.
Mike Sanders
2023-11-04 07:35:30 UTC
Permalink
Post by Ben Bacarisse
I don't follow this.
No biggie Ben (it was my lame attempt at being facetious).
Ultimately the burden of clarity lies squarely on the
shoulders of the poster, and in this case, that would be me.
--
:wq
Mike Sanders
Janis Papanagnou
2023-11-03 22:14:57 UTC
Permalink
Post by Mike Sanders
Post by Mike Sanders
function nth(day) {
if (day ~ /^[0-9]+$/) {
if (day ~ /^1[1-3]$/ || day > 20) {
if (day % 10 == 1) return day "st"
if (day % 10 == 2) return day "nd"
if (day % 10 == 3) return day "rd"
}
return day "th"
}
return day
}
[...]
function nth(num) {
if (num ~ /^[0-9]+$/) {
if (num % 100 != 11 && num % 100 != 12 && num % 100 != 13) {
if (num % 10 == 1) return num "st"
if (num % 10 == 2) return num "nd"
if (num % 10 == 3) return num "rd"
}
return num "th"
}
return num
}
[...]
Hi Mike, I like your second version better since it doesn't _mix_
arithmetic with pattern comparisons. (Okay, there's still the
initial pattern, but as a overall test pattern that's fine, IMO.)

I had written such a function in shell and it was using patterns

case ${num} in
(*![0-9]*) x="" ;;
(*11|*12|*13) x=th ;;
(*1) x=st ;;
(*2) x=nd ;;
(*3) x=rd ;;
(*) x=th ;;
esac

I think (in shell) patterns are better legible. But also the Awk
transcript with patterns has a good legibility and reflects the
(literal) definition of the definition (e.g. Wikipedia)

switch (num) {
case /[^0-9]/: x="" ; break ;
case /11$|12$|13$/: x="th" ; break ;
case /1$/: x="st" ; break ;
case /2$/: x="nd" ; break ;
case /3$/: x="rd" ; break ;
default: x="th" ; break ;
}

(I've used GNU Awk's switch, but it can also be written with 'if'.)

Take care when using anchors; in your first version with /^1[1-3]$/
you where matching only three numbers. Maybe /1[1-3]$/ was intended?

Janis
Janis Papanagnou
2023-11-03 22:24:24 UTC
Permalink
Post by Janis Papanagnou
[...]
Hi Mike, I like your second version better since it doesn't _mix_
arithmetic with pattern comparisons. (Okay, there's still the
initial pattern, but as a overall test pattern that's fine, IMO.)
Just one additional comment about why I like the pattern approach
better; three levels of nested 'if' makes legibility unnecessary
difficult, especially in comparison.
Post by Janis Papanagnou
[...]
I think (in shell) patterns are better legible. But also the Awk
transcript with patterns has a good legibility and reflects the
(literal) definition of the definition (e.g. Wikipedia)
"(literal) description (e.g. of the Wikipedia definition)."

(Sorry for my sloppy writing.)

Janis
Mike Sanders
2023-11-04 07:38:00 UTC
Permalink
Janis Papanagnou <janis_papanagnou+***@hotmail.com> wrote:

Yes, thinking the same here Janis & even still, the 1st version seemed
a little off. And the 1st pattern? Prevents 'Footh' (chuckle sounds
silly to even write much less speak aloud).
Post by Janis Papanagnou
I think (in shell) patterns are better legible. But also the Awk
transcript with patterns has a good legibility and reflects the
(literal) definition of the definition (e.g. Wikipedia)
switch (num) {
case /[^0-9]/: x="" ; break ;
case /11$|12$|13$/: x="th" ; break ;
case /1$/: x="st" ; break ;
case /2$/: x="nd" ; break ;
case /3$/: x="rd" ; break ;
default: x="th" ; break ;
}
Sure enough, it is very legible & concise at least to my eyes.
Post by Janis Papanagnou
(I've used GNU Awk's switch, but it can also be written with 'if'.)
I know, Arnold has done an outstanding job with Gawk, 'case' is very
practical & function pointers too, those are so nifty!
Post by Janis Papanagnou
Take care when using anchors; in your first version with /^1[1-3]$/
you where matching only three numbers. Maybe /1[1-3]$/ was intended?
Yeah, the whole thing was sort of a mess (I'd forgotten I had that script).
Post by Janis Papanagnou
(Sorry for my sloppy writing.)
Shoot, no worries Janis. My writing is hardly ever error three.

No wait! I meant 'error free' =)

You know, where I call home, here in the Prairies of North America,
our dialect of English is very colloquial (meaning informal, or rustic).
For instance, if I wanted to ask another if s/he agreed that a fence
was constructed in a robust & strong way, I might ask:

Q: She's hell built for stout, yeah?

A: Sure enough, if ever there was, she is.

...so you can see its relative. We at comp.lang.awk can work it out.

Also, my earnest thanks to all for putting up with my flood of posts.
Sometimes, when you have an itch, well you have to scratch, & that's
where I'm at right now it seems.

Well folks, I'm off for the weekend. My 5yr old granddaughter is en-route
even as I write this & she's just beginning to learn to read. And I'll be
front & center to witness her recite either 'Curious George' or
'Cat In The Hat'. She's so excited she's beside herself & I want to
honor her efforts at greater cognition. =)
--
:wq
Mike Sanders
Bruce Horrocks
2023-11-03 23:40:45 UTC
Permalink
Post by Mike Sanders
if (num % 100 != 11 && num % 100 != 12 && num % 100 != 13) {
You could trivially re-write this line as

if (num % 100 < 11 || num % 100 > 13) {

to save a comparison but the logic is slightly less clear.

Even less clear is to re-write as

if (num % 100 > 13 || num % 100 < 11) {

to take better advantage of lazy evaluation.
--
Bruce Horrocks
Surrey, England
Mike Sanders
2023-11-04 07:39:32 UTC
Permalink
Post by Bruce Horrocks
You could trivially re-write this line as
if (num % 100 < 11 || num % 100 > 13) {
to save a comparison but the logic is slightly less clear.
Even less clear is to re-write as
if (num % 100 > 13 || num % 100 < 11) {
to take better advantage of lazy evaluation.
Though the latter edges out the former, I'll take your 1st
construct Bruce just to keep a little clarity (Lord knows
I need it, chuckle).

Script updated & also added contributing author's names:

https://busybox.neocities.org/notes/nth.txt
--
:wq
Mike Sanders
Ed Morton
2023-11-05 14:04:30 UTC
Permalink
Post by Mike Sanders
Post by Mike Sanders
function nth(day) {
if (day ~ /^[0-9]+$/) {
if (day ~ /^1[1-3]$/ || day > 20) {
if (day % 10 == 1) return day "st"
if (day % 10 == 2) return day "nd"
if (day % 10 == 3) return day "rd"
}
return day "th"
}
return day
}
# tags: nth, ordinal, suffix, digit, numbers, awk, code
#
# appends ordinal suffix to space delimited numerals
# Michael Sanders 2023
# https://busybox.neocities.org/notes/nth.txt
#
# usage example: echo 101 42 23 98 foo | awk -f nth.txt
#
# output (1 per line): 101st 42nd 23rd 98th foo
#
# https://en.wikipedia.org/wiki/Ordinal_numeral
function nth(num) {
if (num ~ /^[0-9]+$/) {
if (num % 100 != 11 && num % 100 != 12 && num % 100 != 13) {
if (num % 10 == 1) return num "st"
if (num % 10 == 2) return num "nd"
if (num % 10 == 3) return num "rd"
}
return num "th"
}
return num
}
{
delete v
`split($0,v)` will delete v before repopulating it, no need to do it
explicitly before calling `split()` plus that would make your code
non-portable as `delete array` isn't defined by POSIX (yet).
Post by Mike Sanders
split($0, v)
for (x in v) print nth(v[x])
The would print the output in a "random" order, do `for (x=1; x in v;
x++)` instead to get the same output order as the input order.

You don't need split() and an array at all, though, all you need is `for
(x=1; x<=NF; x++) print nth($x)`.
Post by Mike Sanders
}
Consider doing this instead (untested) to address the above points and
for improved efficiency:

BEGIN {
huns[11]; huns[12]; huns[13]
split("st nd rd th th th th th th",tens)
tens[0]="th"
}

function nth(num, sfx) {
if (num ~ /^[0-9]+$/) {
if ( !((num % 100) in huns) ) {
sfx = tens[num % 10]
}
}
return num sfx
}

{
for (x=1; x<=NF; x++) print nth($x)
}

Regards,

Ed.
Ed Morton
2023-11-05 14:13:46 UTC
Permalink
Post by Ed Morton
Post by Mike Sanders
Post by Mike Sanders
function nth(day) {
   if (day ~ /^[0-9]+$/) {
     if (day ~ /^1[1-3]$/ || day > 20) {
       if (day % 10 == 1) return day "st"
       if (day % 10 == 2) return day "nd"
       if (day % 10 == 3) return day "rd"
     }
       return day "th"
   }
   return day
}
# tags: nth, ordinal, suffix, digit, numbers, awk, code
#
# appends ordinal suffix to space delimited numerals
# Michael Sanders 2023
# https://busybox.neocities.org/notes/nth.txt
#
# usage example: echo 101 42 23 98 foo | awk -f nth.txt
#
# output (1 per line): 101st 42nd 23rd 98th foo
#
# https://en.wikipedia.org/wiki/Ordinal_numeral
function nth(num) {
   if (num ~ /^[0-9]+$/) {
     if (num % 100 != 11 && num % 100 != 12 && num % 100 != 13) {
       if (num % 10 == 1) return num "st"
       if (num % 10 == 2) return num "nd"
       if (num % 10 == 3) return num "rd"
     }
     return num "th"
   }
   return num
}
{
   delete v
`split($0,v)` will delete v before repopulating it, no need to do it
explicitly before calling `split()` plus that would make your code
non-portable as `delete array` isn't defined by POSIX (yet).
Post by Mike Sanders
   split($0, v)
   for (x in v) print nth(v[x])
The would print the output in a "random" order, do `for (x=1; x in v;
x++)` instead to get the same output order as the input order.
You don't need split() and an array at all, though, all you need is `for
(x=1; x<=NF; x++) print nth($x)`.
Post by Mike Sanders
}
Consider doing this instead (untested) to address the above points and
BEGIN {
    huns[11]; huns[12]; huns[13]
    split("st nd rd th th th th th th",tens)
    tens[0]="th"
}
function nth(num,       sfx) {
   if (num ~ /^[0-9]+$/) {
      if ( !((num % 100) in huns) ) {
         sfx = tens[num % 10]
      }
   }
   return num sfx
}
{
   for (x=1; x<=NF; x++) print nth($x)
}
Regards,
    Ed.
or if you don't want to use a BEGIN section for some reason then remove
it and change `nth()` to this which is very, very slightly less
efficient than the above:

function nth(num, sfx) {
if (num ~ /^[0-9]+$/) {
if ( !(1 in tens) ) {
huns[11]; huns[12]; huns[13]
split("st nd rd th th th th th th",tens)
tens[0]="th"
}
if ( !((num % 100) in huns) ) {
sfx = tens[num % 10]
}
}
return num sfx
}

You may want to come up with some naming convention for huns[] and
tens[] to make it clear they're global and avoid clashing with anything
else of the same name anywhere else in the script such as prefixing them
with the name of the function that uses them, "Nth_huns", or some common
indicator you use for all global variables, e.g. "G_huns" or whatever
else makes sense to you.

Regards,

Ed.
Janis Papanagnou
2023-11-05 16:21:52 UTC
Permalink
Hi Ed!
Post by Ed Morton
or if you don't want to use a BEGIN section for some reason then remove
it and change `nth()` to this which is very, very slightly less
function nth(num, sfx) {
if (num ~ /^[0-9]+$/) {
if ( !(1 in tens) ) {
huns[11]; huns[12]; huns[13]
split("st nd rd th th th th th th",tens)
tens[0]="th"
}
if ( !((num % 100) in huns) ) {
sfx = tens[num % 10]
}
}
return num sfx
}
I don't see where the advantage here is. It is (IMO) unnecessary complex
(many 'if' control constructs, incomplete branches, undefined variables)
for such a simple task and also harder to understand (or analyze in case
of errors[*]).

Simple pattern matches would be straightforward for such a primitive and
certainly not time-critical[**] function like "nth()".

Janis
Post by Ed Morton
[...]
[*] The code does not produce correct results as presented. If corrected
it would probably get even (at least a bit) more complex, I suppose.

[**] In case that would have been the reason for this implementation.
Janis Papanagnou
2023-11-05 17:01:41 UTC
Permalink
Post by Janis Papanagnou
Hi Ed!
Post by Ed Morton
or if you don't want to use a BEGIN section for some reason then remove
it and change `nth()` to this which is very, very slightly less
function nth(num, sfx) {
if (num ~ /^[0-9]+$/) {
if ( !(1 in tens) ) {
huns[11]; huns[12]; huns[13]
split("st nd rd th th th th th th",tens)
tens[0]="th"
}
if ( !((num % 100) in huns) ) {
sfx = tens[num % 10]
}
}
return num sfx
}
I don't see where the advantage here is. It is (IMO) unnecessary complex
(many 'if' control constructs, incomplete branches, undefined variables)
for such a simple task and also harder to understand (or analyze in case
of errors[*]).
Simple pattern matches would be straightforward for such a primitive and
certainly not time-critical[**] function like "nth()".
Being curious I've compared timing of above [not corrected] function
with the simpler and clearer pattern matching based algorithm

function nth (num)
{
if (num ~ /[^0-9]/) return num;
else if (num ~ /11$|12$|13$/) return num "th"; # or use: /1[1-3]$/
else if (num ~ /1$/) return num "st";
else if (num ~ /2$/) return num "nd";
else if (num ~ /3$/) return num "rd";
else return num "th";
}

For _10 million_ function calls the difference is ~2s (~15s vs. ~17s).
(Tested with GNU Awk 4.2.0)
Post by Janis Papanagnou
Janis
Post by Ed Morton
[...]
[*] The code does not produce correct results as presented. If corrected
it would probably get even (at least a bit) more complex, I suppose.
[**] In case that would have been the reason for this implementation.
Ed Morton
2023-11-05 18:17:26 UTC
Permalink
On 11/5/2023 11:01 AM, Janis Papanagnou wrote:
<snip>
Post by Janis Papanagnou
Being curious I've compared timing of above [not corrected] function
with the simpler and clearer pattern matching based algorithm
function nth (num)
{
if (num ~ /[^0-9]/) return num;
else if (num ~ /11$|12$|13$/) return num "th"; # or use: /1[1-3]$/
else if (num ~ /1$/) return num "st";
else if (num ~ /2$/) return num "nd";
else if (num ~ /3$/) return num "rd";
else return num "th";
}
For _10 million_ function calls the difference is ~2s (~15s vs. ~17s).
(Tested with GNU Awk 4.2.0)
Did you also test it with the OPs code that I was showing an alternative
implementation of or just with the above code which is yet another
alternative implementation? If so, what was the result of that run?

Ed.
Janis Papanagnou
2023-11-05 19:04:56 UTC
Permalink
Post by Ed Morton
<snip>
Post by Janis Papanagnou
Being curious I've compared timing of above [not corrected] function
with the simpler and clearer pattern matching based algorithm
function nth (num)
{
if (num ~ /[^0-9]/) return num;
else if (num ~ /11$|12$|13$/) return num "th"; # or use: /1[1-3]$/
else if (num ~ /1$/) return num "st";
else if (num ~ /2$/) return num "nd";
else if (num ~ /3$/) return num "rd";
else return num "th";
}
For _10 million_ function calls the difference is ~2s (~15s vs. ~17s).
(Tested with GNU Awk 4.2.0)
Did you also test it with the OPs code that I was showing an alternative
implementation of or just with the above code which is yet another
alternative implementation? If so, what was the result of that run?
Sorry, I was not interested in the OP's code. Since I had implemented
a shell version some years ago that was very readable code as opposed
to the OP's version (or your variant), that could also be implemented
in a better legible (and less complex) form in Awk, I abstained from
testing other's codes; this is something the authors should do.

I obviously missed that your variant was just intended as an optimized
version of the OP's approach, so don't take my criticism too serious.

Fast pre-calculated solutions can also be legible. Taking the idea of
your variant further can simplify it even, e.g.

function nth_pre (num)
{
if (num ~ /[^0-9]/) return num
return num e[num%100]
}

Building that array e[] should be explained, though, but that can be
easily done (IMO), e.g..

function init_e ()
{
for (i=0; i<=99; i++) # init with 'th' as the prevalent suffix
e[i] = "th"
for (i=1; i<=91; i+=7) { # exceptions to that are low digits 1..3
e[i++] = "st"
e[i++] = "nd"
e[i++] = "rd"
}
e[11] = e[12] = e[13] = "th" # and exception to that are 11..13
}

(something like that).

Janis
Post by Ed Morton
Ed.
Ed Morton
2023-11-06 12:54:21 UTC
Permalink
Post by Janis Papanagnou
Post by Ed Morton
<snip>
Post by Janis Papanagnou
Being curious I've compared timing of above [not corrected] function
with the simpler and clearer pattern matching based algorithm
function nth (num)
{
if (num ~ /[^0-9]/) return num;
else if (num ~ /11$|12$|13$/) return num "th"; # or use: /1[1-3]$/
else if (num ~ /1$/) return num "st";
else if (num ~ /2$/) return num "nd";
else if (num ~ /3$/) return num "rd";
else return num "th";
}
For _10 million_ function calls the difference is ~2s (~15s vs. ~17s).
(Tested with GNU Awk 4.2.0)
Did you also test it with the OPs code that I was showing an alternative
implementation of or just with the above code which is yet another
alternative implementation? If so, what was the result of that run?
Sorry, I was not interested in the OP's code. Since I had implemented
a shell version some years ago that was very readable code as opposed
to the OP's version (or your variant), that could also be implemented
in a better legible (and less complex) form in Awk, I abstained from
testing other's codes; this is something the authors should do.
I obviously missed that your variant was just intended as an optimized
version of the OP's approach, so don't take my criticism too serious.
Fast pre-calculated solutions can also be legible.
Apparently we just have different ideas of legible - to me a hash lookup
is the clear and obvious way to implement this rather than a bunch of
if/else regexp comparisons.
Post by Janis Papanagnou
Taking the idea of your variant further can simplify it even, e.g.
function nth_pre (num)
{
if (num ~ /[^0-9]/) return num
return num e[num%100]
}
Building that array e[] should be explained, though, but that can be
easily done (IMO), e.g..
function init_e ()
{
for (i=0; i<=99; i++) # init with 'th' as the prevalent suffix
e[i] = "th"
for (i=1; i<=91; i+=7) { # exceptions to that are low digits 1..3
e[i++] = "st"
e[i++] = "nd"
e[i++] = "rd"
}
e[11] = e[12] = e[13] = "th" # and exception to that are 11..13
}
(something like that).
Janis
That's a very good idea. I'd use this:

function nth_pre (num)
{
return num (num ~ /[^0-9]/ ? "" : e[num%100])
}

to squeeze out the last bit of redundancy but that's nit-picking.

Ed.
Janis Papanagnou
2023-11-07 13:17:59 UTC
Permalink
Post by Ed Morton
Post by Janis Papanagnou
Fast pre-calculated solutions can also be legible.
Apparently we just have different ideas of legible
(This makes no sense; given what I said here and what you say below.)

The advantage of the pattern approach is, though, that it matches
exactly the specification/definition[*], as the cases are typically
explained. - But I think it's boring to talk on that "ideas" level.
Post by Ed Morton
- to me a hash lookup
is the clear and obvious way to implement this rather than a bunch of
if/else regexp comparisons.
Post by Janis Papanagnou
Taking the idea of your variant further can simplify it even, e.g.
function nth_pre (num)
{
if (num ~ /[^0-9]/) return num
return num e[num%100]
}
[...]
That's a very good idea. [...]
Yes, it's simple and legible. - No unnecessary 'if' cases and no hash
arrays ("huns" and "tens") that introduce unnecessary complexity
where you need only a single and clear mapping of the relevant digits.

Janis

[*] See for example https://en.wikipedia.org/wiki/Ordinal_suffix

Ed Morton
2023-11-05 18:14:53 UTC
Permalink
Post by Janis Papanagnou
Hi Ed!
Post by Ed Morton
or if you don't want to use a BEGIN section for some reason then remove
it and change `nth()` to this which is very, very slightly less
function nth(num, sfx) {
if (num ~ /^[0-9]+$/) {
if ( !(1 in tens) ) {
huns[11]; huns[12]; huns[13]
split("st nd rd th th th th th th",tens)
tens[0]="th"
}
if ( !((num % 100) in huns) ) {
sfx = tens[num % 10]
}
}
return num sfx
}
I don't see where the advantage here is. It is (IMO) unnecessary complex
(many 'if' control constructs, incomplete branches, undefined variables)
for such a simple task and also harder to understand (or analyze in case
of errors[*]).
Not sure where you're seeing any of those things. There are fewer "if"s
than were in the OPs code, if by "incomplete branches" you mean "if"
without an "else" there's nothing wrong with that and the OPs c9ode had
more of them, no undefined variables and IMO it's much simpler than the
original code. And that code above was just for "if you don't want to
use a BEGIN section for some reason" while the version I'd use is what I
originally posted:

BEGIN {
huns[11]; huns[12]; huns[13]
split("st nd rd th th th th th th",tens)
tens[0]="th"
}

function nth(num, sfx) {
if (num ~ /^[0-9]+$/) {
if ( !((num % 100) in huns) ) {
sfx = tens[num % 10]
}
}
return num sfx
}

which is simpler and faster again.
Post by Janis Papanagnou
Simple pattern matches would be straightforward for such a primitive and
certainly not time-critical[**] function like "nth()".
If the OP has a large input file and wants to add "th" or "nd" to the
end of numbers on each line then "nth()" is probably the only part of it
that IS time-critical.
Post by Janis Papanagnou
Janis
Post by Ed Morton
[...]
[*] The code does not produce correct results as presented. If corrected
it would probably get even (at least a bit) more complex, I suppose.
All I was trying to do was show an alternative implementation of the OPs
code, not solve the problem the OP was trying to solve, and all I did to
test it was check it produced the same output as the OPs script for the
sample input they provided, which it does:

OPs code:

$ echo 101 42 23 98 foo | awk -f nth.txt
101st
42nd
23rd
98th
foo

My code:

$ echo 101 42 23 98 foo | awk -f nth.awk
101st
42nd
23rd
98th
foo

So, could you elaborate and provide an example where my code fails and
the OPs succeeds?
Post by Janis Papanagnou
[**] In case that would have been the reason for this implementation.
The reason for this implementation is it's faster, simpler, and doesn't
contain duplicate code so it'll be easier to maintain.

Ed.
Ed Morton
2023-11-05 18:40:20 UTC
Permalink
Post by Ed Morton
Post by Janis Papanagnou
Hi Ed!
Post by Ed Morton
or if you don't want to use a BEGIN section for some reason then remove
it and change `nth()` to this which is very, very slightly less
function nth(num,       sfx) {
     if (num ~ /^[0-9]+$/) {
        if ( !(1 in tens) ) {
           huns[11]; huns[12]; huns[13]
           split("st nd rd th th th th th th",tens)
           tens[0]="th"
        }
        if ( !((num % 100) in huns) ) {
           sfx = tens[num % 10]
        }
     }
     return num sfx
}
I don't see where the advantage here is. It is (IMO) unnecessary complex
(many 'if' control constructs, incomplete branches, undefined variables)
for such a simple task and also harder to understand (or analyze in case
of errors[*]).
Not sure where you're seeing any of those things. There are fewer "if"s
than were in the OPs code, if by "incomplete branches" you mean "if"
without an "else" there's nothing wrong with that and the OPs c9ode had
more of them, no undefined variables and IMO it's much simpler than the
original code. And that code above was just for "if you don't want to
use a BEGIN section for some reason" while the version I'd use is what I
BEGIN {
    huns[11]; huns[12]; huns[13]
    split("st nd rd th th th th th th",tens)
    tens[0]="th"
}
function nth(num,       sfx) {
   if (num ~ /^[0-9]+$/) {
      if ( !((num % 100) in huns) ) {
         sfx = tens[num % 10]
      }
   }
   return num sfx
}
which is simpler and faster again.
Post by Janis Papanagnou
Simple pattern matches would be straightforward for such a primitive and
certainly not time-critical[**] function like "nth()".
If the OP has a large input file and wants to add "th" or "nd" to the
end of numbers on each line then "nth()" is probably the only part of it
that IS time-critical.
Post by Janis Papanagnou
Janis
Post by Ed Morton
[...]
[*] The code does not produce correct results as presented. If corrected
it would probably get even (at least a bit) more complex, I suppose.
All I was trying to do was show an alternative implementation of the OPs
code, not solve the problem the OP was trying to solve, and all I did to
test it was check it produced the same output as the OPs script for the
$ echo 101 42 23 98 foo | awk -f nth.txt
101st
42nd
23rd
98th
foo
$ echo 101 42 23 98 foo | awk -f nth.awk
101st
42nd
23rd
98th
foo
So, could you elaborate and provide an example where my code fails and
the OPs succeeds?
Never mind, I see it - I wasn't assigning sfx for some numbers, fixed by
changing "nth()" to:

function nth(num, sfx) {
if (num ~ /^[0-9]+$/) {
sfx = ( (num % 100) in huns ? "th" : tens[num % 10] )
}
return num sfx
}

Thanks for the heads up.

Ed.
Post by Ed Morton
Post by Janis Papanagnou
[**] In case that would have been the reason for this implementation.
The reason for this implementation is it's faster, simpler, and doesn't
contain duplicate code so it'll be easier to maintain.
    Ed.
Janis Papanagnou
2023-11-05 18:48:47 UTC
Permalink
Post by Ed Morton
Post by Janis Papanagnou
Simple pattern matches would be straightforward for such a primitive and
certainly not time-critical[**] function like "nth()".
If the OP has a large input file and wants to add "th" or "nd" to the
end of numbers on each line then "nth()" is probably the only part of it
that IS time-critical.
Sorry, no. - The sample sizes I used are hilariously large.
Post by Ed Morton
All I was trying to do was show an alternative implementation of the OPs
code, not solve the problem the OP was trying to solve, and all I did to
test it was check it produced the same output as the OPs script for the
$ echo 101 42 23 98 foo | awk -f nth.txt
101st
42nd
23rd
98th
foo
$ echo 101 42 23 98 foo | awk -f nth.awk
101st
42nd
23rd
98th
foo
So, could you elaborate and provide an example where my code fails and
the OPs succeeds?
I've just checked the output of your code (not the OP's), and got

1st
2nd
3rd
4th
5th
6th
7th
8th
9th
10th
11
12
13
14th
15th
16th
17th
18th
19th
20th
...

My intention was *not* to understand where the coding problem was,
neither the original code nor the (derived?) variant.
Post by Ed Morton
Post by Janis Papanagnou
[**] In case that would have been the reason for this implementation.
The reason for this implementation is it's faster, simpler, and doesn't
contain duplicate code so it'll be easier to maintain.
Maybe. - Even though performance is actually no real issue, I've
tested a couple variants (with even larger data sets: 50 millions).

InitPre/LoopFunc: 0 11 (to not count invariants)

if/else-if: 71
if: 79
switch: 76
arithm/lookup: 68
precomp/lookup: 66

Taking the lookup approach even further with a precalculated array
of the first 100 numbers, the code gets yet _simpler_ (and faster)

function nth_pre (num)
{
if (num ~ /[^0-9]/) return num
return num e[num%100]
}

Not that these variants would matter WRT performance (pattern: 71s,
your variant: 68s, precalculated array of 100 significant numbers:
66s) is negligible. But code should be readable (if possible), IMO.

Janis
Loading...