Issue
I thought I understood well the use of grep, it finds the matches and I create a filter with them, to select or delete the rows that contain the matches. But I have in some cases I don't get the expected result. I give examples:
T_fil [grep('\\b(vp)\\b', T_fil$Int),]
# ok, leave the row that has vp
T_fil [-grep('\\b(vp)\\b', T_fil$Int),]
# ok, remove the ones with vp
T_fil [grep('\\b(q)\\b', T_fil$Int),]
# good, don't select any rows because none have q
T_fil [-grep('\\b(q)\\b', T_fil$Int),]
# error, it deletes all the rows for me, and I shouldn't delete any, because none have q
T_fil [grep('\\b()\\b', T_fil$Int),]
# good, select all, I don't understand why, but being empty I'm interested in not making any changes
T_fil [-grep('\\b()\\b', T_fil$Int),]
# error, it deletes all the rows and what interests me is that being empty does not make any changes
Can someone explain this behavior to me and what can I do to make the result correct?
Solution
The issue is that, if grep
finds no hit, it returns an empty vector. A negative empty vector is still an empty vector. You then use that to select rows, and nothing gets selected.
In reality, grep
is bad for selecting elements for this reason. A better function that works almost the same is grepl
, which returns a logical vector instead, and the result of which can be inverted with !
:
T_fil [grepl('\\b(q)\\b', T_fil$Int),]
# good, don't select any rows because none have q
T_fil [!grepl('\\b(q)\\b', T_fil$Int),]
# good, select all rows
Alternatively, you could also pass invert = TRUE
to grep
to obtain the same result. That is: do not use -grep(…)
to invert the result of a query, it’s unreliable. Instead, either use grep(…, invert = TRUE)
or use !grepl(…)
.
Answered By - Konrad Rudolph Answer Checked By - Marie Seifert (WPSolving Admin)