Friday, April 15, 2022

[SOLVED] Standardizing the character strings in multiple rows (R or Unix)

Issue

I would like to standardize all those _xxxxxx character strings to the xxxxxxH format in V1 column.

V1          V2      V3
122223H     20      Test kits
122224H     23      Test kits
122225H     42      Test kits
122227H     31      Test kits
_122228     23      Test kits
_122229     57      Test kits
_122231     21      Test kits
122232H     33      Test kits
122234H     22      Test kits
.......     ..      .... ....
.......     ..      .... ....
.......     ..      .... ....
122250H     33      Test kits

I tried to solve it with gsub function in R but couldn't make the exact format that I need. Any kind of suggestions, please!!! Unix based commands are also helpful.

df <- gsub("_","H",c(file$V1))

Outputs;

"H1222228" "H1222229" "H1222231"   

Desired outputs;

V1          V2      V3
122223H     20      Test kits
122224H     23      Test kits
122225H     42      Test kits
122227H     31      Test kits
122228H     23      Test kits
122229H     57      Test kits
122231H     21      Test kits
122232H     33      Test kits
122234H     22      Test kits
.......     ..      .... ....
.......     ..      .... ....
.......     ..      .... ....
122250H     33      Test kits

Solution

Try the following, though more elegant solutions may exist:

df <- data.frame(v1 = c("122223H","122224H","122225H","122227H","_122228","_122229"),
           v2 = c(21,23,42,31,23,57),
           v3 = rep("Test Kits", times = 6))


df$newstring <- gsub("_","",c(df$v1))
df$newstring <- ifelse(grepl("H", df$newstring, fixed = TRUE), df$newstring, paste0(df$newstring,"H"))


# > df
# v1 v2        v3 newstring
# 1 122223H 21 Test Kits   122223H
# 2 122224H 23 Test Kits   122224H
# 3 122225H 42 Test Kits   122225H
# 4 122227H 31 Test Kits   122227H
# 5 _122228 23 Test Kits   122228H
# 6 _122229 57 Test Kits   122229H


Answered By - jpsmith
Answer Checked By - Timothy Miller (WPSolving Admin)