Issue
I have a csv file with the following head:
Unnamed: 0;Unnamed: 0.1;year;country;word;frequency;count;freq_prop_headlines;word_len;freq_rank;hfreq_rank;theme
And I have to filter the results between the years 2010 and 2011.
I created an sh:
grep -E '^([^;]*;){2}(2010|2011);' $1
Now, I have to improve the previous regular expression selecting all the countries in the column "country" except the value "all countries". And I did:
grep -E '^([^;]*;){2}(2010|2011);^all countries.*;' $1
But it didn't work.
Shall you help me, please?
Thank you!!!
Edit:
Sample data:
Unnamed: 0;Unnamed: 0.1;year;country;word;frequency;count;freq_prop_headlines;word_len;freq_rank;hfreq_rank;theme
10450;61006;2020;all countries;relationship;603;381402;0.001581009;12;656;653;female stereotypes
10451;61007;2021;all countries;relationship;270;234227;0.001152728;12;656;653;female stereotypes
10452;61013;2010;all countries;burn;36;35448;0.001015572;4;657;657;crime and violence
10453;61014;2011;all countries;burn;75;58436;0.001283455;4;657;657;crime and violence
10454;61015;2012;all countries;burn;105;94038;0.00111657;4;657;657;crime and violence
8928;51085;2010;USA;gangrape;0;1912;0.0;8;856;856;crime and violence
8929;51086;2011;USA;gangrape;0;3274;0.0;8;856;856;crime and violence
And the output must be the following:
8928;51085;2010;USA;gangrape;0;1912;0.0;8;856;856;crime and violence
8929;51086;2011;USA;gangrape;0;3274;0.0;8;856;856;crime and violence
Solution
awk
is more suitable for this job more as input is ;
separated column row data.
awk -F ';' '$3 ~ /^201[01]$/ && $4 != "all countries"' file
8928;51085;2010;USA;gangrape;0;1912;0.0;8;856;856;crime and violence
8929;51086;2011;USA;gangrape;0;3274;0.0;8;856;856;crime and violence
As OP is looking for only a solution involving grep -E
here is a solution doing that:
grep -E '^([^;]*;){2}201[01];([^a]|a[^l]|al[^l]|all[^ ]|all [^c]|all c[^o]|all co[^u]|all cou[^n]|all coun[^t]|all count[^r]]|all countr[^i]]|all countri[^e]|all countrie[^s])' file
8928;51085;2010;USA;gangrape;0;1912;0.0;8;856;856;crime and violence
8929;51086;2011;USA;gangrape;0;3274;0.0;8;856;856;crime and violence
Answered By - anubhava Answer Checked By - Robin (WPSolving Admin)