Issue
I want to find rows that have the same values in col1 and col2, and get their distinct third column
City,Sunset,Anna
City,Sunset,Anna
City,Sunset,Ben
City,Sunset,Ben
My desired output is:
City,Sunset,"Anna,Ben"
So far, I have this code
cat file.txt | sed 's/,/|/2' | awk -F"," '{if (!($1 in a)) a[$1] = $2; else a[$1]=a[$1] "," $2 } END { for (key in a) print key, a[key] }' OFS=,
But Im only getting
City,Sunset,Anna,Anna,Ben,Ben
When the output should be
City,Sunset,"Anna,Ben"
Could someone help me with this one?
Solution
Using any awk:
$ cat tst.awk
BEGIN { FS=OFS="," }
{
key = $1 FS $2
val = $3
}
key != prev {
if ( NR > 1 ) {
print prev, "\"" vals "\""
}
vals = sep = ""
prev = key
delete seen
}
!seen[val]++ {
vals = vals sep val
sep = OFS
}
END {
print prev, "\"" vals "\""
}
$ awk -f tst.awk file
City,Sunset,"Anna,Ben"
That will work if you have multiple different values in the first 2 input fields, and regardless of the order of the 3rd field values.
It will output values in the order they appear in the input and only stores the values for one key pair at a time rather than storing the whole file.
For example, given this input:
$ cat file
City,Sunset,Anna
City,Sunset,Ben
City,Sunset,Sue
City,Sunset,Ben
Town,Sunrise,Ben
Town,Sunrise,Phil
it will produce what I assume is the expected output:
$ awk -f tst.awk file
City,Sunset,"Anna,Ben,Sue"
Town,Sunrise,"Ben,Phil"
The above script assumes your input is grouped by the values of the first 2 fields, if it isn't then run sort -t, -k1,2
on it first.
Answered By - Ed Morton Answer Checked By - Marie Seifert (WPSolving Admin)