Issue
I would like to write a bash script, using awk
, to determine how many lines start with each character.
Sample input: ./script.sh txt1 txt2 text1 text2
(filenames could be random too)
txt1
asdaga
dasdag
asdasdag
awqr
zvvbrh
tqetvh
xbrrte
txt2
npoajd
pojta
pskdna
nghir
asdt
bmkgjk
Sample output:
--- txt1 ---
a : 3
b : 0
c : 0
...
z : 1
...
ascii255 : 0
--- txt2 ---
a : 1
b : 1
...
p : 2
...
--- text3 ---
etc
where [character] : [number of rows that start with that character]
is the correct format.
After printing every file one by one, I would also like to print a collective result, that follows the same format, so every charactercount
will show the sum of each textfile's characters, so in the given example (for only txt1
and txt2
) the output would be:
a : 4
b : 1
...
(epl: txt1
contains 3 lines that start with a
, txt2
contains 1 line that start with a
, so the total will be 3+1 = 4)
Here is the code that I wrote, but I am stuck, it doesn't work, I am confused with the awk
syntax:
#!/bin/bash
awk '
{split($0,arr)
n=length(arr)
for(i=1;i<=255;i++){
char[i]=0;
}
for(i=1;i<=n;i++){
actchar=substr(1,1,1);
char[actchar]++;
printf("--- %s ---\n",FILENAME);
for(j=1;j<=255;j++){
prinf("%c : %s\n",j,char[j]);
}
}
'
Solution
This may be what you're trying to do, using any awk:
$ cat tst.sh
#!/usr/bin/env bash
awk '
{
char = substr($0,1,1)
cnt[FILENAME,char]++
}
END {
OFS = " : "
beg = 97
end = 122
for ( fileNr=1; fileNr<ARGC; fileNr++ ) {
fname = ARGV[fileNr]
print "--- " fname " ---"
for ( charNr=beg; charNr<=end; charNr++ ) {
char = sprintf("%c", charNr)
print char, cnt[fname,char]+0
tot[char] += cnt[fname,char]
}
}
print "--- Total ---"
for ( charNr=beg; charNr<=end; charNr++ ) {
char = sprintf("%c", charNr)
print char, tot[char]
}
}
' "${@:--}"
$ ./tst.sh txt1 txt2
--- txt1 ---
a : 3
b : 0
c : 0
d : 1
e : 0
f : 0
g : 0
h : 0
i : 0
j : 0
k : 0
l : 0
m : 0
n : 0
o : 0
p : 0
q : 0
r : 0
s : 0
t : 1
u : 0
v : 0
w : 0
x : 1
y : 0
z : 1
--- txt2 ---
a : 1
b : 1
c : 0
d : 0
e : 0
f : 0
g : 0
h : 0
i : 0
j : 0
k : 0
l : 0
m : 0
n : 2
o : 0
p : 2
q : 0
r : 0
s : 0
t : 0
u : 0
v : 0
w : 0
x : 0
y : 0
z : 0
--- Total ---
a : 4
b : 1
c : 0
d : 1
e : 0
f : 0
g : 0
h : 0
i : 0
j : 0
k : 0
l : 0
m : 0
n : 2
o : 0
p : 2
q : 0
r : 0
s : 0
t : 1
u : 0
v : 0
w : 0
x : 1
y : 0
z : 1
If you want to loop over some larger range of characters just change the beg
and end
variable settings.
Answered By - Ed Morton Answer Checked By - Senaida (WPSolving Volunteer)