Discussion:
awk not outputting in scientific notation despite %e
(too old to reply)
Ross
2023-05-07 21:36:04 UTC
Permalink
I am using awk for some basic calculations and want to force my output to be in scientific notation. I am using OFMT="%.15e" to accomplish this. On most machines, I get the expected output:

$ awk 'BEGIN { OFMT = "%.15e"; print 4.483923595133619e+29 / 1000 }'
4.483923595133619e+26

But a version of awk on my cluster gives:

$ awk --version | head --lines=2
GNU Awk 4.0.2
Copyright (C) 1989, 1991-2012 Free Software Foundation.

$ awk 'BEGIN { OFMT = "%.15e"; print 4.483923595133619e+29 / 1000 }'
448392359513361882871234560

Why is this version/configuration of awk not outputting to scientific notation as requested? How can I portably get my desired result (4.483923595133619e+26)?

(This question was originally posted to stackexchange at https://stackoverflow.com/questions/76187717/awk-not-outputting-in-scientific-notation-despite-e but it didn't get much attention and I was directed here instead.)
Kaz Kylheku
2023-05-07 22:24:48 UTC
Permalink
Post by Ross
$ awk 'BEGIN { OFMT = "%.15e"; print 4.483923595133619e+29 / 1000 }'
4.483923595133619e+26
$ awk --version | head --lines=2
GNU Awk 4.0.2
Copyright (C) 1989, 1991-2012 Free Software Foundation.
$ awk 'BEGIN { OFMT = "%.15e"; print 4.483923595133619e+29 / 1000 }'
448392359513361882871234560
Can repro on Ubutu 18:

0:[0507:150838]:sun-go:~/gawk$ awk 'BEGIN { OFMT = "%e"; print 3.1 }'
3.100000e+00
0:[0507:150845]:sun-go:~/gawk$ awk 'BEGIN { OFMT = "%e"; print 10e50 }'
999999999999999993220948674361627976461708441944064

$ awk --version | head -2
GNU Awk 5.1.60, API: 3.2
Copyright (C) 1989, 1991-2021 Free Software Foundation.

This seems to be a "feature". It's as if the format is ignored for
integers (that being numbers which have a zero fractional part).

0:[0507:151109]:sun-go:~/gawk$ awk 'BEGIN { OFMT = "%e"; print 1 }'
1
0:[0507:151117]:sun-go:~/gawk$ awk 'BEGIN { OFMT = "%e"; print 100 }'
100
0:[0507:151121]:sun-go:~/gawk$ awk 'BEGIN { OFMT = "%e"; print 300 }'
300
0:[0507:151123]:sun-go:~/gawk$ awk 'BEGIN { OFMT = "%e"; print 1.0 }'
1
0:[0507:151128]:sun-go:~/gawk$ awk 'BEGIN { OFMT = "%e"; print 1.0001 }'
1.000100e+00
0:[0507:151131]:sun-go:~/gawk$ awk 'BEGIN { OFMT = "%e"; print 300.0001
}'
3.000001e+02

If so, I'd expect that to be documented. I don't see it. Moreover, OFMT
comes from POSIX, which say anything like this either.

It doesn't seem to have anything to do with whether the value is the
result of arithmetic:

0:[0507:152027]:sun-go:~/gawk$ awk 'BEGIN { OFMT = "%e"; print 300.0 +
1.0 - 1.0 }'
300
0:[0507:152035]:sun-go:~/gawk$ awk 'BEGIN { OFMT = "%e"; print 0.25 * 4
}'
1

However, gawk could be constant-folding these expressions. Let's use
run-time data:

$ awk 'BEGIN { OFMT = "%e" }; { print $1 }'
1
1
1.5
1.5

$ awk 'BEGIN { OFMT = "%e" }; { print $1 + 0 }'
1
1
1.5
1.500000e+00

So the OFMT is ignored entirely unless the input is converted to a
numeric string by a forced calculation, and then it's still ignored for
values with a zero fractional part.
--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @***@mstdn.ca
Janis Papanagnou
2023-05-07 22:29:30 UTC
Permalink
Post by Ross
I am using awk for some basic calculations and want to force my
output to be in scientific notation. I am using OFMT="%.15e" to
$ awk 'BEGIN { OFMT = "%.15e"; print 4.483923595133619e+29 / 1000 }'
4.483923595133619e+26
$ awk --version | head --lines=2
GNU Awk 4.0.2
Copyright (C) 1989, 1991-2012 Free Software Foundation.
$ awk 'BEGIN { OFMT = "%.15e"; print 4.483923595133619e+29 / 1000 }'
448392359513361882871234560
Why is this version/configuration of awk not outputting to scientific
notation as requested? How can I portably get my desired result
(4.483923595133619e+26)?
(This question was originally posted to stackexchange at
https://stackoverflow.com/questions/76187717/awk-not-outputting-in-scientific-notation-despite-e
but it didn't get much attention and I was directed here instead.)
I can reproduce that with a GNU awk version 3, 4, and 5, respectively.
You've already got some analysis and explanations from Kaz. I suggest
to resort to printf (instead of using print) then...

$ awk 'BEGIN { printf "%.15e\n", 4.483923595133619e+29 / 1000 }'
4.483923595133619e+26


Janis
Keith Thompson
2023-05-08 00:13:10 UTC
Permalink
Post by Ross
I am using awk for some basic calculations and want to force my output
to be in scientific notation. I am using OFMT="%.15e" to accomplish
$ awk 'BEGIN { OFMT = "%.15e"; print 4.483923595133619e+29 / 1000 }'
4.483923595133619e+26
$ awk --version | head --lines=2
GNU Awk 4.0.2
Copyright (C) 1989, 1991-2012 Free Software Foundation.
$ awk 'BEGIN { OFMT = "%.15e"; print 4.483923595133619e+29 / 1000 }'
448392359513361882871234560
Why is this version/configuration of awk not outputting to scientific
notation as requested? How can I portably get my desired result
(4.483923595133619e+26)?
(This question was originally posted to stackexchange at
https://stackoverflow.com/questions/76187717/awk-not-outputting-in-scientific-notation-despite-e
but it didn't get much attention and I was directed here instead.)
I think you've found a longstanding bug in gawk.

Here's the test program I used:
```
BEGIN {
x = 9
y = 9.001
z = 9e50

printf("Using printf: %.15e\n", x)
printf(" %.15e\n", y)
printf(" %.15e\n", z)

OFMT="%.15e"
printf("Using OFMT: "); print(x)
printf(" "); print(y)
printf(" "); print(z)
}
```

The output should (I think) be the same for printf and print with OFMT.

On my Ubuntu system, all versions of gawk (I tried versions from 3.1.8
to 5.2.1 and the latest version from git) produced the same (incorrect)
output:
```
Using printf: 9.000000000000000e+00
9.000999999999999e+00
9.000000000000000e+50
Using OFMT: 9
9.000999999999999e+00
900000000000000027129553701548362001410714104758272
```

/usr/bin/original-awk has a similar bug but it shows up with different
values.

BTW, in your example you could replace "483923595133619e+29 / 1000" by
"4.83923595133619e+43"; it produces the same result and is a bit
simpler. With a bit of experimenting, I found that the output is
incorrect for 9, correct for values slightly larger than 9, and
incorrect for very large values.

I don't know what caused the bug. Looking into the gawk source, it does
its own output formatting rather than relying on sprintf, to avoid
problems with things like OFMT="%s" (not sure that's worthwhile, since
the behavior is undefined if OFMT isn't a floating-point conversion
specification).

I've submitted a bug report to the bug-gawk mailing list. I'll post the
URL when the archive updates.
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
Working, but not speaking, for XCOM Labs
void Void(void) { Void(); } /* The recursive call of the void */
Keith Thompson
2023-05-08 01:29:03 UTC
Permalink
Keith Thompson <Keith.S.Thompson+***@gmail.com> writes:
[...]
Post by Keith Thompson
I've submitted a bug report to the bug-gawk mailing list. I'll post the
URL when the archive updates.
Here's the bug report:

https://lists.gnu.org/archive/html/bug-gawk/2023-05/msg00010.html
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
Working, but not speaking, for XCOM Labs
void Void(void) { Void(); } /* The recursive call of the void */
Keith Thompson
2023-05-08 02:59:20 UTC
Permalink
Post by Keith Thompson
[...]
Post by Keith Thompson
I've submitted a bug report to the bug-gawk mailing list. I'll post the
URL when the archive updates.
https://lists.gnu.org/archive/html/bug-gawk/2023-05/msg00010.html
And thanks to a reply on the mailing list from Andrew J. Schorr:
https://lists.gnu.org/archive/html/bug-gawk/2023-05/msg00011.html
I no longer believe this is a bug.

What I should have noticed is that the values for which setting
"OFMT=.15e" *doesn't* produce output in scientific notation are
precisely the values that are mathematically integers. Sufficiently
large floating-point values are always equal to integers.

Here's what POSIX says:
https://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html

A numeric value that is exactly equal to the value of an integer
(see Concepts Derived from the ISO C Standard) shall be converted to
a string by the equivalent of a call to the sprintf function (see
String Functions) with the string "%d" as the fmt argument and the
numeric value being converted as the first and only expr argument.

And here's a demonstration showing what's happening:
```
$ cat foo.awk
#!/usr/bin/awk -f

BEGIN {
OFMT="%.16e"
for (i = 50; i <= 55; i ++) {
x = 2 ** i - 0.5
printf("2**%d - 0.5 = %.3f = ", i, x)
print(x)
}
}
$ ./foo.awk
2**50 - 0.5 = 1125899906842623.500 = 1.1258999068426235e+15
2**51 - 0.5 = 2251799813685247.500 = 2.2517998136852475e+15
2**52 - 0.5 = 4503599627370495.500 = 4.5035996273704955e+15
2**53 - 0.5 = 9007199254740992.000 = 9007199254740992
2**54 - 0.5 = 18014398509481984.000 = 18014398509481984
2**55 - 0.5 = 36028797018963968.000 = 36028797018963968
$
```

The value in the parent article was 4.483923595133619e+29 / 1000, which
is an exact integer value (whether evaluated mathematically or in double
precision floating point).

Using printf rather than setting OFMT is the *solution* to the original
problem, not a workaround for a bug.
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
Working, but not speaking, for XCOM Labs
void Void(void) { Void(); } /* The recursive call of the void */
Kaz Kylheku
2023-05-08 05:14:06 UTC
Permalink
Post by Keith Thompson
Post by Keith Thompson
[...]
Post by Keith Thompson
I've submitted a bug report to the bug-gawk mailing list. I'll post the
URL when the archive updates.
https://lists.gnu.org/archive/html/bug-gawk/2023-05/msg00010.html
https://lists.gnu.org/archive/html/bug-gawk/2023-05/msg00011.html
I no longer believe this is a bug.
What I should have noticed is that the values for which setting
"OFMT=.15e" *doesn't* produce output in scientific notation are
precisely the values that are mathematically integers.
That doesn't mean formatting should break.

I don't see the requirement in POSIX that OFMT should bypass the format
for values without a fractional part.

In TXR Lisp, I made sure that even bignums work with ~e:

1> (typeof (expt 2 300))
bignum
2> (fmt "~e" (expt 2 300))
"2.037e90"

There are limitations: it works by conversion to floating-point.

So no, you cannot use it on arbitrarily large integers:

1> (fmt "~e" (expt 2 1000))
"1.072e301"
2> (fmt "~e" (expt 2 2000))
** out-of-range floating-point result
** during evaluation at expr-2:1 of form (fmt "~e" (expt 2 2000))

Better than nothing.
Post by Keith Thompson
large floating-point values are always equal to integers.
https://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html
A numeric value that is exactly equal to the value of an integer
(see Concepts Derived from the ISO C Standard) shall be converted to
a string by the equivalent of a call to the sprintf function (see
String Functions) with the string "%d" as the fmt argument and the
numeric value being converted as the first and only expr argument.
I think that means, shall be converted to a string *when needed to be
one*. Not that integers should be considered to be strings, so that
OFMT is then bypassed.
--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @***@mstdn.ca
Ross
2023-05-08 03:02:25 UTC
Permalink
Post by Keith Thompson
[...]
I've submitted a bug report to the bug-gawk mailing list. I'll post the
URL when the archive updates.
https://lists.gnu.org/archive/html/bug-gawk/2023-05/msg00010.html
--
Working, but not speaking, for XCOM Labs
void Void(void) { Void(); } /* The recursive call of the void */
Ok, thanks for the quick and comprehensive response, everybody. I'll use printf as a workaround and keep an eye on the bug report.
Ross
2023-05-08 03:06:27 UTC
Permalink
Post by Ross
Post by Keith Thompson
[...]
I've submitted a bug report to the bug-gawk mailing list. I'll post the
URL when the archive updates.
https://lists.gnu.org/archive/html/bug-gawk/2023-05/msg00010.html
--
Working, but not speaking, for XCOM Labs
void Void(void) { Void(); } /* The recursive call of the void */
Ok, thanks for the quick and comprehensive response, everybody. I'll use printf as a workaround and keep an eye on the bug report.
Ah, that response from Andrew makes sense and I appreciate the clarification.
Keith Thompson
2023-05-08 03:10:35 UTC
Permalink
Post by Ross
Post by Keith Thompson
[...]
I've submitted a bug report to the bug-gawk mailing list. I'll post the
URL when the archive updates.
https://lists.gnu.org/archive/html/bug-gawk/2023-05/msg00010.html
Ok, thanks for the quick and comprehensive response, everybody. I'll
use printf as a workaround and keep an eye on the bug report.
See my latest followup. It's not a bug, and using printf is the correct
solution, not just a workaround. (One could argue that it's a
misfeature in the awk language, but it's hard to avoid in a language
that distinguishes integers from non-integers by value rather than by
type.)
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
Working, but not speaking, for XCOM Labs
void Void(void) { Void(); } /* The recursive call of the void */
Kaz Kylheku
2023-05-08 07:36:54 UTC
Permalink
Post by Keith Thompson
Post by Ross
Post by Keith Thompson
[...]
I've submitted a bug report to the bug-gawk mailing list. I'll post the
URL when the archive updates.
https://lists.gnu.org/archive/html/bug-gawk/2023-05/msg00010.html
Ok, thanks for the quick and comprehensive response, everybody. I'll
use printf as a workaround and keep an eye on the bug report.
See my latest followup. It's not a bug, and using printf is the correct
solution, not just a workaround. (One could argue that it's a
misfeature in the awk language, but it's hard to avoid in a language
that distinguishes integers from non-integers by value rather than by
type.)
But printf is in the language, and printf("%e", expr) handles it fine.

The OFMT feature just has to treat numeric-valued expressions
using whatever logic that is already making printf("%e", expr) work.
--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @***@mstdn.ca
Andrew Schorr
2023-05-08 15:08:20 UTC
Permalink
Hi,
Post by Kaz Kylheku
But printf is in the language, and printf("%e", expr) handles it fine.
The OFMT feature just has to treat numeric-valued expressions
using whatever logic that is already making printf("%e", expr) work.
I thought Keith clarified this issue above. The POSIX spec says that integer values
should be converted with an implicit "%d" instead of using CONVFMT.
It is also discussed here in the gawk manual:
https://www.gnu.org/software/gawk/manual/html_node/Strings-And-Numbers.html
"As a special case, if a number is an integer, then the result of converting it to a string is always an integer, no matter what the value of CONVFMT may be."

You may ask why OFMT and CONVFMT are handled the same way. I can think of 3 reasons:
1. It's the same code path inside gawk.
2. Historically, there was no distinction between OFMT and CONVFMT, but it was added to address
some corner cases. You can find a discussion of this in the Posix AWK spec.
3. The POSIX awk spec simply codifies how Unix awk has always worked, and this is how it works.

If you don't like it, please use printf to request explicitly what you want. The OFMT and CONVFMT logic
is intended to do the right thing, and it does so in the vast majority of usage cases.

Regards,
Andy
Kaz Kylheku
2023-05-08 23:31:00 UTC
Permalink
Post by Andrew Schorr
Hi,
Post by Kaz Kylheku
But printf is in the language, and printf("%e", expr) handles it fine.
The OFMT feature just has to treat numeric-valued expressions
using whatever logic that is already making printf("%e", expr) work.
I thought Keith clarified this issue above. The POSIX spec says that integer values
should be converted with an implicit "%d" instead of using CONVFMT.
The requirement applies to CONVFMT; it isn't written that it applies to
OFMT.

It is written that CONVFMT is newer: OFMT came first and then CONVFMT
was derived from it as a kind of fork to separate the messy semantics
of field conversion from printing arbitrary arguments (or something
like that).

It is not clear that requirements applying to the derivative CONVFMT
should flow backwards into OFMT. Generally speaking, if a document
introduces some Y that is a new entity, similar to and inheriting
requirements from an existing X, and then also gives requirements only
about Y, those Y requirements do not propagate back to X; they
are one of the attributes of Y that distinguish it from X.

Under CONVFMT, the results of conversion are not going to the display;
they loop back into the program. CONVFMT controls how numbers convert to
strings. If the floating-point format applies to integer values, it can
mess up associative array keys involving integers, which is quite
common.

Therefore, it's easy to see why a hack like that would be in CONVFMT,
but not required in OFMT.

Upon reading the rationale, one might have the impression that
this is one of those corner cases that originally existed in OFMT
that were separated out into CONVFMT.

Therfore, historic knowledge that OFMT implementations before CONFVMT
existed had this hack behavior doesn't amount to anything. POSIX
changed the behavior by introducing an entirely new variable,
such that OFMT no longer controlled conversions, breaking programs
depending on that. At that time, OFMT should have been considered
to not have the %d requirement any more, in the same stroke.
Ben Bacarisse
2023-05-09 00:59:42 UTC
Permalink
Post by Kaz Kylheku
Post by Andrew Schorr
Hi,
Post by Kaz Kylheku
But printf is in the language, and printf("%e", expr) handles it fine.
The OFMT feature just has to treat numeric-valued expressions
using whatever logic that is already making printf("%e", expr) work.
I thought Keith clarified this issue above. The POSIX spec says that
integer values should be converted with an implicit "%d" instead of
using CONVFMT.
The requirement applies to CONVFMT; it isn't written that it applies to
OFMT.
I think it is. Where OFMT applies (in print) the text refers to the
conversion that otherwise uses CONVFMT:

"All expression arguments shall be taken as strings, being converted if
necessary; this conversion shall be as described in Expressions in
awk, with the exception that the printf format in OFMT shall be used
instead of the value in CONVFMT."

and it's the referenced text that has the integer exception with CONVFMT
used for other values. Using OFMT in place of CONVFMT in that text does
not remove the exception.

Of course, if the integer exception was added to the "Exceptions in awk"
section at some later stage, the effect on the use of OFMT in print
statements might have been unintentional.
--
Ben.
Andrew Schorr
2023-05-09 13:03:49 UTC
Permalink
I think it is. Where OFMT applies (in print) the text refers to the
"All expression arguments shall be taken as strings, being converted if
necessary; this conversion shall be as described in Expressions in
awk, with the exception that the printf format in OFMT shall be used
instead of the value in CONVFMT."
and it's the referenced text that has the integer exception with CONVFMT
used for other values. Using OFMT in place of CONVFMT in that text does
not remove the exception.
I think that's right. It also says this:

"The intent has been to specify historical practice in almost all cases."

And I think the history is that all implementations of awk have always converted
integral values using the equivalent of "%d", regardless of the CONVFMT or OFMT setting.

Regards,
Andy
Kpop 2GM
2023-10-27 22:13:58 UTC
Permalink
Post by Andrew Schorr
I think it is. Where OFMT applies (in print) the text refers to the
"All expression arguments shall be taken as strings, being converted if
necessary; this conversion shall be as described in Expressions in
awk, with the exception that the printf format in OFMT shall be used
instead of the value in CONVFMT."
and it's the referenced text that has the integer exception with CONVFMT
used for other values. Using OFMT in place of CONVFMT in that text does
not remove the exception.
"The intent has been to specify historical practice in almost all cases."
And I think the history is that all implementations of awk have always converted
integral values using the equivalent of "%d", regardless of the CONVFMT or OFMT setting.
Regards,
Andy
That would *nearly* be true if not for gawk + GMP's lovely behavior with parsing the decimal dot :

gawk -Mbe 'BEGIN { OFS = RS;

. . . . . print x = 10.^10 * 5^12 * 3, --x, "--------------",
. . . . . . . . . . y = 10^10 * 5^12 * 3, --y, "-----------------", 2^63-1 }'

7324218750000000000
7324218750000000000
--------------
7324218750000000000
7324218749999999999
-----------------
9223372036854775807

everything about "x" and "y" are mathematically identical - both are ***supposed*** to be integers,

but the mere presence of an extra dot (".") at the 1st "10" ( "10." instead of "10") causes gawk + GMP to treat the whole thing as double-precision floating point value instead of GMP's arbitrary precision integers.

which can be confirmed when using the -d- flag to dump variables to /dev/stdout :

x: 7.32421875e+18
y: 7324218749999999999

The same undesirable effect also occurs when the exponent is written as "10^10." instead of "10^10". but interestingly enough, if one adds a

. . . x = int(x)

before

. . . . --x

then it circumvents this annoyance and returns a properly decremented integer value instead

— The 4Chan Teller

Loading...