Monday, July 11, 2022

[SOLVED] Extract valid numbers from character vector in R

Issue

Suppose I have the below character vector

c("hi", "4", "-21", "6.5", "7. 5", "-2.2", "4h")

Now I want to extract only valid numbers which are in the above vector:

c("4", "-21", "6.5", "-2.2")

note: one space in between . and 5 in 7. 5 so not a valid number.

I was trying with regex /^-?(0|[1-9]\\d*)(\\.\\d+)?$/ which is given here but no luck.

So what would be the regex to extract valid numbers from a character vector?


Solution

We can use grep that matches digits with . from the start (^) till the end ($) of the string

grep("^-?[0-9.]+$", v1, value = TRUE)
[1] "4"    "-21"  "6.5"  "-2.2"

Or for fringe cases

grep("^[ -]?[0-9]+(\\.\\d+)?$", c(v1, "4.1.1"), value = TRUE)
[1] "4"    "-21"  "6.5"  "-2.2"

grep("^[ -]?[0-9]+(\\.\\d+)?$", c(v1, "4.1.1", " 2.9"), value = TRUE)
[1] "4"    "-21"  "6.5"  "-2.2" " 2.9"


Answered By - akrun
Answer Checked By - Willingham (WPSolving Volunteer)