Discussion:
Unique In Column
(too old to reply)
Mike Sanders
2023-10-02 07:10:04 UTC
Permalink
# verifies an item is unique to the 2nd column
#
# example file.csv...
#
# name, alias
#
# john, kiwi
# suzi, apple
# suzi, orange

BEGIN { FS = ",[ \t]*|[ \t]+" }

{ Field2Values[tolower($2)] = 1 }

END { if (uniqueItem("apple", FILENAME) != 0) exit 1 }

function uniqueItem(field2, file) {

lowerField2 = tolower(field2)

if(lowerField2 in Field2Values) {
print "Error: '" field2 "' was found in 2nd column of " file
return 1
} else print "Item: '" field2 "' is unique to 2nd column of " file

return 0
}

# eof
--
:wq
Mike Sanders
Mike Sanders
2023-10-02 07:37:03 UTC
Permalink
Post by Mike Sanders
# verifies an item is unique to the 2nd column
quick update, why hard-code a field number anyhow?

# verifies an item is unique to a give column
#
# example file.csv...
#
# name, alias
#
# john, kiwi
# suzi, apple
# suzi, orange

BEGIN { FS = ",[ \t]*|[ \t]+"; COL = 2 }

{ FieldValues[tolower($COL)] = 1 }

END { if (uniqueItem(COL, "apple", FILENAME) != 0) exit 1 }

function uniqueItem(col, field, file) {

lowerField = tolower(field)

if(lowerField in FieldValues) {
print "Error: '" field "' was found in column " col " of " file
return 1
} else print "Item: '" field "' is unique to column " col " of " file

return 0
}

# eof
--
:wq
Mike Sanders
Ed Morton
2023-11-05 15:26:30 UTC
Permalink
Post by Mike Sanders
Post by Mike Sanders
# verifies an item is unique to the 2nd column
quick update, why hard-code a field number anyhow?
# verifies an item is unique to a give column
#
# example file.csv...
#
# name, alias
#
# john, kiwi
# suzi, apple
# suzi, orange
BEGIN { FS = ",[ \t]*|[ \t]+"; COL = 2 }
{ FieldValues[tolower($COL)] = 1 }
END { if (uniqueItem(COL, "apple", FILENAME) != 0) exit 1 }
function uniqueItem(col, field, file) {
lowerField = tolower(field)
if(lowerField in FieldValues) {
print "Error: '" field "' was found in column " col " of " file
return 1
} else print "Item: '" field "' is unique to column " col " of " file
return 0
}
# eof
That's checking whether or not a value exists, not whether or not it's
unique, and producing the wrong output. If we modify it to take a
variable fruit:

$ cat tst.awk
BEGIN { FS = ",[ \t]*|[ \t]+"; COL = 2 }

{ FieldValues[tolower($COL)] = 1 }

END { if (uniqueItem(COL, fruit, FILENAME) != 0) exit 1 }

function uniqueItem(col, field, file) {

lowerField = tolower(field)

if(lowerField in FieldValues) {
print "Error: '" field "' was found in column " col " of " file
return 1
} else print "Item: '" field "' is unique to column " col " of " file

return 0
}

and add a second "apple" in column 2 of your CSV:

$ cat file.csv
john, kiwi
suzi, apple
suzi, orange
gwen, apple

then we can run it as:

$ awk -v fruit='kiwi' -f tst.awk file.csv
Error: 'kiwi' was found in column 2 of file.csv
$ awk -v fruit='apple' -f tst.awk file.csv
Error: 'apple' was found in column 2 of file.csv
$ awk -v fruit='grape' -f tst.awk file.csv
Item: 'grape' is unique to column 2 of file.csv

and you can see it's reporting that "grape" is a unique value when it's
not actually present at all.

If we change the script to:

$ cat tst.awk
BEGIN { FS = ",[ \t]*|[ \t]+"; COL = 2 }

{ FieldValues[tolower($COL)]++ }

END { if (uniqueItem(COL, fruit, FILENAME) != 0) exit 1 }

function uniqueItem(col, field, file) {

lowerField = tolower(field)

if(lowerField in FieldValues) {
if (FieldValues[lowerField] == 1) {
print "Item: '" field "' is unique to column " col " of " file
}
else {
print "Error: '" field "' was found in column " col " of " file
return 1
}
}
else {
print "Error: '" field "' was not found in column " col " of " file
}

return 0
}

THEN it'll report unique "fruit" values correctly as well as reporting
which are present/absent:

$ awk -v fruit='kiwi' -f tst.awk file.csv
Item: 'kiwi' is unique to column 2 of file.csv
$ awk -v fruit='apple' -f tst.awk file.csv
Error: 'apple' was found in column 2 of file.csv
$ awk -v fruit='grape' -f tst.awk file.csv
Error: 'grape' was not found in column 2 of file.csv

Regards,

Ed.
Mike Sanders
2023-11-06 03:14:17 UTC
Permalink
[...]
Thanks Ed, must study your example & mull it over =)
--
:wq
Mike Sanders
Loading...