Discussion:
Reverse Polish Notation Parser
(too old to reply)
Mike Sanders
2023-11-06 02:26:38 UTC
Permalink
Ping Janis: Question... Are you interested in rewriting this
as a Gawk only implementation? Would be great for switch/case
statements IMO. If so, I'll add your version to the file at
my website.

Folks, assuming Janis says yes, lets work off that version
& use it as a baseline.

At any rate, a work in progress...

# tags: rpn, calc, numbers, awk, code
#
# awk reverse polish notation parser
# Michael Sanders 2023
# https://busybox.neocities.org/notes/rpn.txt
#
# operands...
#
# large numbers? this really depends on your awk
# decimal fractions? yes, via a hopfully not too
# clever regex that expects decimal points to be
# surrounded by digits eg...
#
# valid 0.5, invalid .5
# valid 5.0, invalid 5.
#
# operators...
#
# + addition
# - subtraction
# * multiplication
# / division
# % modulus
# ^ exponentiation (see footnotes)
#
# *always* surround input with 'quotes' as some RPN
# operators can be misconstrued as meta-characters
# by your shell, example...
#
# echo '0.089 2.5 + 2 * 3 ^' | awk -f rpn.txt
#
# arbitrary precision using printf format specifier...
#
# %0.0f 1
# %0.2f 1.00 (default)
# %0.9f 1.000000000
#
# further reading...
#
# https://en.wikipedia.org/wiki/Reverse_Polish_notation

{ RPN($0) }

function RPN(input, x, y, z, stack) {
split(input, stack, /[[:space:]]+/)
z = 0
for (x = 1; x in stack; ++x) {
y = stack[x]
if (y ~ /^[0-9]+(\.[0-9]+)?$/) stack[++z] = y
else {
if (z < 2) { print "error: insufficient operands"; return }
if (y == "+") stack[z-1] += stack[z]
else if (y == "-") stack[z-1] -= stack[z]
else if (y == "*") stack[z-1] *= stack[z]
else if (y == "^") stack[z-1] ^= stack[z] # see footnotes
else if (y == "/") {
if (stack[z] == 0) { print "error: division by zero"; return }
stack[z-1] /= stack[z]
} else if (y == "%") {
if (stack[z] == 0) { print "error: modulo by zero"; return }
stack[z-1] %= stack[z]
} else { print "error: invalid operator " y; return }
--z
}
}
if (z != 1) { print "error: invalid RPN expression"; return }
printf "%0.2f\n", stack[z]
}

# footnotes: exponentiation operators for differing awks...
#
# awk 'BEGIN {print 2 ^ 3}' should print 8 if using ^
# stack[z-1] ^= stack[z] works on some awks using ^
# stack[z-1] = stack[z-1] ^ stack[z] always works if using ^
#
# awk 'BEGIN {print 2 ** 3}' should print 8 if using **
# stack[z-1] **= stack[z] works on some awks using **
# stack[z-1] = stack[z-1] ** stack[z] always works if using **

# eof
--
:wq
Mike Sanders
Mike Sanders
2023-11-06 05:27:20 UTC
Permalink
[...]
quick update I've meaning to make...

https://busybox.neocities.org/notes/rpn.txt
--
:wq
Mike Sanders
Janis Papanagnou
2023-11-06 11:14:35 UTC
Permalink
Post by Mike Sanders
Ping Janis: Question... Are you interested in rewriting this
as a Gawk only implementation? Would be great for switch/case
statements IMO. If so, I'll add your version to the file at
my website.
This is no rocket science. If you have

if (some_variable = "Val1") Cmd1 ;
else if (some_variable = "Val2") Cmd2 ;
...
else Def ;

the switch statement (for simple string comparisons) looks like

switch (some_variable) {
case "Val1": Cmd1 ; break ;
case "Val2": Cmd2 ; break ;
...
default: Def ;
}

The point is that you have to provide and evaluate 'some_variable'
just once. The "disadvantage" is that you will need the 'break'
commands to not fall through to the next case (which is another
common feature of such statements but not necessary in your
case).

You can apply 'switch' in function RPN() and errormessage(), but
whether it makes sense or not to introduce 'switch' and sacrifice
portability is on you to decide.

GNU Awk has (as opposed to some other programming languages
like C) the advantage that you can compare not only scalars;
you can also compare strings and patterns in GNU Awk's switch:

switch (some_variable) [
case 42: ...
case "string": ...
case /pattern/: ...
}

I don't see any GNU Awk specific features/constructs that would
suggest a [non-portable] transcription.


One point you may want to consider is the trim() function; the
two substitutions can be combined in one

sub (/^[[:space:]]+(.*)[[:space:]]+$/, "&", str)

(but test that in your Awk versions before using it; "&" is an
old feature but off the top of my head I'm not sure whether the
subexpression with parenthesis /...(...).../ is generally
supported in other Awks).

Janis
Post by Mike Sanders
Folks, assuming Janis says yes, lets work off that version
& use it as a baseline.
At any rate, a work in progress...
[...]
{ RPN($0) }
function RPN(input, x, y, z, stack) {
split(input, stack, /[[:space:]]+/)
z = 0
for (x = 1; x in stack; ++x) {
y = stack[x]
if (y ~ /^[0-9]+(\.[0-9]+)?$/) stack[++z] = y
else {
if (z < 2) { print "error: insufficient operands"; return }
if (y == "+") stack[z-1] += stack[z]
else if (y == "-") stack[z-1] -= stack[z]
else if (y == "*") stack[z-1] *= stack[z]
else if (y == "^") stack[z-1] ^= stack[z] # see footnotes
else if (y == "/") {
if (stack[z] == 0) { print "error: division by zero"; return }
stack[z-1] /= stack[z]
} else if (y == "%") {
if (stack[z] == 0) { print "error: modulo by zero"; return }
stack[z-1] %= stack[z]
} else { print "error: invalid operator " y; return }
--z
}
}
if (z != 1) { print "error: invalid RPN expression"; return }
printf "%0.2f\n", stack[z]
}
[...]
Mike Sanders
2023-11-06 12:12:14 UTC
Permalink
Post by Janis Papanagnou
the switch statement (for simple string comparisons) looks like
switch (some_variable) {
case "Val1": Cmd1 ; break ;
case "Val2": Cmd2 ; break ;
...
default: Def ;
}
Can multiple cases be 'or'ed (combined) as in $SHELL?

case q|Q) quit(); break;
Post by Janis Papanagnou
I don't see any GNU Awk specific features/constructs that would
suggest a [non-portable] transcription.
Ok, so no.
Post by Janis Papanagnou
(but test that in your Awk versions before using it; "&" is an
old feature but off the top of my head I'm not sure whether the
subexpression with parenthesis /...(...).../ is generally
supported in other Awks).
'&' I did not know this, must study it further.

Thanks. Off to work for me.
--
:wq
Mike Sanders
Janis Papanagnou
2023-11-07 12:36:35 UTC
Permalink
Post by Mike Sanders
Post by Janis Papanagnou
the switch statement (for simple string comparisons) looks like
switch (some_variable) {
case "Val1": Cmd1 ; break ;
case "Val2": Cmd2 ; break ;
...
default: Def ;
}
Can multiple cases be 'or'ed (combined) as in $SHELL?
What's always possible is to use the same bulky way as other C based
languages:

case "q": case "Q":

in case of comparing patterns I'd try

case /q|Q/:

(adding anchors ^ and $ as necessary). For details see [*].
Post by Mike Sanders
case q|Q) quit(); break;
[...]

Janis

[*] https://www.gnu.org/software/gawk/manual/gawk.html#Switch-Statement
Mike Sanders
2023-11-07 18:46:27 UTC
Permalink
Post by Janis Papanagnou
[*] https://www.gnu.org/software/gawk/manual/gawk.html#Switch-Statement
Will read/study (as always thanks Janis!)
--
:wq
Mike Sanders
Ed Morton
2023-11-07 11:48:34 UTC
Permalink
Post by Janis Papanagnou
Post by Mike Sanders
Ping Janis: Question... Are you interested in rewriting this
as a Gawk only implementation? Would be great for switch/case
statements IMO. If so, I'll add your version to the file at
my website.
This is no rocket science. If you have
if (some_variable = "Val1") Cmd1 ;
else if (some_variable = "Val2") Cmd2 ;
...
else Def ;
the switch statement (for simple string comparisons) looks like
switch (some_variable) {
case "Val1": Cmd1 ; break ;
case "Val2": Cmd2 ; break ;
...
default: Def ;
}
The point is that you have to provide and evaluate 'some_variable'
just once. The "disadvantage" is that you will need the 'break'
commands to not fall through to the next case (which is another
common feature of such statements but not necessary in your
case).
You can apply 'switch' in function RPN() and errormessage(), but
whether it makes sense or not to introduce 'switch' and sacrifice
portability is on you to decide.
GNU Awk has (as opposed to some other programming languages
like C) the advantage that you can compare not only scalars;
switch (some_variable) [
case 42: ...
case "string": ...
case /pattern/: ...
Nitpick - it's

case /regexp/:

rather than:

case /pattern/

The word "pattern" is ambiguous and misused all over awk documentation.
You can make some argument for it in:

pattern { action }

since that includes `BEGIN`, integers, etc. I'd argue that should be:

condition { action }

but in the "case" statement above what goes inside `/.../` is simply and
always a regexp.
Post by Janis Papanagnou
}
I don't see any GNU Awk specific features/constructs that would
suggest a [non-portable] transcription.
One point you may want to consider is the trim() function; the
two substitutions can be combined in one
sub (/^[[:space:]]+(.*)[[:space:]]+$/, "&", str)
(but test that in your Awk versions before using it; "&" is an
old feature but off the top of my head I'm not sure whether the
subexpression with parenthesis /...(...).../ is generally
supported in other Awks).
You can write that, but it's not a capture group that can be
backreferenced from the replacement and if it was "&" wouldn't refer to
the string that matched ".*" anyway, it'd refer to the string that
matched the whole regexp.

You could use a capture group in GNU awk for gensub():

str = gensub (/^[[:space:]]+(.*)[[:space:]]+$/, "\\1", 1, str)

and in most awks you could do:

gsub (/^[[:space:]]+|[[:space:]]+$/, "", str)

but there are some out there that will fail to do both substitutions for
that case (tawk and nawk maybe?) and so you need:

sub(/^[[:space:]]+/, "", str)
sub(/[[:space:]]+$/, "", str)

for those but since this requires an awk that supports POSIX character
classes anyway you could just declare the script as portable to all
POSIX awks (rather than all "modern" awks) and not worry about ones that
fail given the above gsub().

Alternatively in any POSIX awk you could do:

match(str,/[^[:space:]](.*[^[:space:]])?/)
str = substr(str,RSTART,RLENGTH)

and I expect that'd also work in any awk that supports POSIX character
classes but does not support the above gsub().

Regards,

Ed.
Post by Janis Papanagnou
Janis
Post by Mike Sanders
Folks, assuming Janis says yes, lets work off that version
& use it as a baseline.
At any rate, a work in progress...
[...]
{ RPN($0) }
function RPN(input, x, y, z, stack) {
split(input, stack, /[[:space:]]+/)
z = 0
for (x = 1; x in stack; ++x) {
y = stack[x]
if (y ~ /^[0-9]+(\.[0-9]+)?$/) stack[++z] = y
else {
if (z < 2) { print "error: insufficient operands"; return }
if (y == "+") stack[z-1] += stack[z]
else if (y == "-") stack[z-1] -= stack[z]
else if (y == "*") stack[z-1] *= stack[z]
else if (y == "^") stack[z-1] ^= stack[z] # see footnotes
else if (y == "/") {
if (stack[z] == 0) { print "error: division by zero"; return }
stack[z-1] /= stack[z]
} else if (y == "%") {
if (stack[z] == 0) { print "error: modulo by zero"; return }
stack[z-1] %= stack[z]
} else { print "error: invalid operator " y; return }
--z
}
}
if (z != 1) { print "error: invalid RPN expression"; return }
printf "%0.2f\n", stack[z]
}
[...]
Janis Papanagnou
2023-11-07 13:02:27 UTC
Permalink
Post by Ed Morton
Post by Janis Papanagnou
switch (some_variable) [
case 42: ...
case "string": ...
case /pattern/: ...
Nitpick - it's
And I expected the nitpick on my typo using '=' (instead of '==') in
the 'if' comparison. ;-)
Post by Ed Morton
case /pattern/
The word "pattern" is ambiguous and misused all over awk documentation.
You are right that the historic (and, sadly, surviving) documentation
speaks about 'pattern' and '/regexp'/'.
Post by Ed Morton
pattern { action }
condition { action }
Decades ago I was the first one suggesting the term 'condition' here so
you certainly don't need to teach me. (Myself I was using that term in
my Awk courses even since the 1990's.)
Post by Ed Morton
but in the "case" statement above what goes inside `/.../` is simply and
always a regexp.
Yes. Sorry for my sloppiness here. Thanks for the nitpick.
Post by Ed Morton
[...]
Post by Janis Papanagnou
One point you may want to consider is the trim() function; the
two substitutions can be combined in one
sub (/^[[:space:]]+(.*)[[:space:]]+$/, "&", str)
(but test that in your Awk versions before using it; "&" is an
old feature but off the top of my head I'm not sure whether the
subexpression with parenthesis /...(...).../ is generally
supported in other Awks).
You can write that, but it's not a capture group that can be
backreferenced from the replacement and if it was "&" wouldn't refer to
the string that matched ".*" anyway, it'd refer to the string that
matched the whole regexp.
Using my Awk it does what advertised. I merely didn't find it clearly
documented whether the '&' is generally guaranteed to refer to the
grouping.
I deliberately abstained from gensub() here since the OP avoids GNU
Awk specifics.
Post by Ed Morton
str = gensub (/^[[:space:]]+(.*)[[:space:]]+$/, "\\1", 1, str)
gsub (/^[[:space:]]+|[[:space:]]+$/, "", str)
Yes, this is a sensible alternative.
Post by Ed Morton
but there are some out there that will fail to do both substitutions for
Really? But why? - Alternatives in regexp is certainly an old feature
(at least since nawk in the 1980's).
Post by Ed Morton
sub(/^[[:space:]]+/, "", str)
sub(/[[:space:]]+$/, "", str)
[...]
Janis
Janis Papanagnou
2023-11-07 13:37:40 UTC
Permalink
Post by Janis Papanagnou
Post by Ed Morton
Post by Janis Papanagnou
One point you may want to consider is the trim() function; the
two substitutions can be combined in one
sub (/^[[:space:]]+(.*)[[:space:]]+$/, "&", str)
(but test that in your Awk versions before using it; "&" is an
old feature but off the top of my head I'm not sure whether the
subexpression with parenthesis /...(...).../ is generally
supported in other Awks).
You can write that, but it's not a capture group that can be
backreferenced from the replacement and if it was "&" wouldn't refer to
the string that matched ".*" anyway, it'd refer to the string that
matched the whole regexp.
Using my Awk it does what advertised. I merely didn't find it clearly
documented whether the '&' is generally guaranteed to refer to the
grouping.
I stand corrected, I had a wrong test case! - "&" does _not_ work
on grouping, and as you said (and as it is specified), it generally
defines the whole match. (It wouldn't make sense otherwise.) - Thanks!

Janis
Ed Morton
2023-11-08 10:22:28 UTC
Permalink
<snip>
Post by Janis Papanagnou
Post by Ed Morton
gsub (/^[[:space:]]+|[[:space:]]+$/, "", str)
Yes, this is a sensible alternative.
Post by Ed Morton
but there are some out there that will fail to do both substitutions for
Really? But why? - Alternatives in regexp is certainly an old feature
(at least since nawk in the 1980's).
In my opinion it's just a bug. It was demonstrated to me when I posted
an answer on Stack Overflow several years ago that I can't find right
now. I know it's not gawk and I'm about 99% sure it's neither BSD awk
nor /usr/xpg[46]/bin/awk so the only non-oawk awks I can imagine would
have this problem are nawk, tawk (I'm about 80% sure I remember tawk is
one that DOES have the problem), and/or busybox awk, none of which I
have access to, so if anyone has and could test them by running:

$ echo ' foo ' | awk '{gsub(/^ +| +$/,""); print "<" $0 ">"}'
<foo>

and let us know which don't produce that output, that'd be great.

Ed.
Ed Morton
2023-11-08 11:41:01 UTC
Permalink
Post by Ed Morton
<snip>
Post by Janis Papanagnou
    gsub (/^[[:space:]]+|[[:space:]]+$/, "", str)
Yes, this is a sensible alternative.
but there are some out there that will fail to do both substitutions for
Really? But why? - Alternatives in regexp is certainly an old feature
(at least since nawk in the 1980's).
In my opinion it's just a bug. It was demonstrated to me when I posted
an answer on Stack Overflow several years ago that I can't find right
now. I know it's not gawk and I'm about 99% sure it's neither BSD awk
nor /usr/xpg[46]/bin/awk so the only non-oawk awks I can imagine would
have this problem are nawk, tawk (I'm about 80% sure I remember tawk is
one that DOES have the problem), and/or busybox awk, none of which I
$ echo ' foo ' | awk '{gsub(/^ +| +$/,""); print "<" $0 ">"}'    <foo>
and let us know which don't produce that output, that'd be great.
    Ed.
I found a different answer I had posted (also years ago) that mentions
this bug and there I say it's tawk and mawk1 where it occurs. Still
can't find the original where I was told about it (and it was in a
discussion in comments that's probably been removed by now) but that
should be good enough for others to reproduce it.

The specific case was removing quotes from around a field in a CSV:

$ printf '"foo"' | awk '{gsub(/^"+|"+$/,""); print "<" $0 ">"}'
<foo>

but I doubt if that detail matters.

Regards,

Ed.

Kaz Kylheku
2023-11-07 17:24:38 UTC
Permalink
Post by Mike Sanders
Ping Janis: Question... Are you interested in rewriting this
as a Gawk only implementation? Would be great for switch/case
statements IMO. If so, I'll add your version to the file at
my website.
The cppawk preprocessor supports a case macro which
compiles to the switch statement for Gawk, or to portable
Awk code.

The macro is documented in its own man page:

https://www.kylheku.com/cgit/cppawk/tree/cppawk-case.1

case is safer than switch because it doesn't have implicit
"fallthrough".

Each case must end with one of: cbreak, cfall or cret: break,
fallthrough or return.
--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @***@mstdn.ca
NOTE: If you use Google Groups, I don't see you, unless you're whitelisted.
Mike Sanders
2023-11-07 18:50:51 UTC
Permalink
Post by Kaz Kylheku
The cppawk preprocessor supports a case macro which
compiles to the switch statement for Gawk, or to portable
Awk code.
https://www.kylheku.com/cgit/cppawk/tree/cppawk-case.1
case is safer than switch because it doesn't have implicit
"fallthrough".
Each case must end with one of: cbreak, cfall or cret: break,
fallthrough or return.
Hey-hey Kaz.

That's really nifty in fact. I might try my hand at a cppawk
project just to familiarizes myself with its workings.

Thanks & mucho appreciate the head's up kind sir, must study
more about it all.
--
:wq
Mike Sanders
Loading...