Issue
S_1004_DKDL220006264-1A_HLGLFDSX3_L4_cleaned_1_fastqc.html
S_1004_DKDL220006264-1A_HLGLFDSX3_L4_cleaned_1_fastqc.zip
S_1004_DKDL220006264-1A_HLGLFDSX3_L4_cleaned_2_fastqc.html
S_1004_DKDL220006264-1A_HLGLFDSX3_L4_cleaned_2_fastqc.zip
S_1006_DKDL220006298-1A_HKFTLDSX3_L1_cleaned_1_fastqc.html
S_1006_DKDL220006298-1A_HKFTLDSX3_L1_cleaned_1_fastqc.zip
S_1006_DKDL220006298-1A_HKFTLDSX3_L1_cleaned_2_fastqc.html
above are the name of the files in a folder.
I want to remove between second _
from right and second _
from left. So that the output looks like
S_1004__1_fastqc.html
S_1004__1_fastqc.zip
S_1004__2_fastqc.html
S_1004__2_fastqc.zip
S_1006__1_fastqc.html
S_1006__1_fastqc.zip
S_1006__2_fastqc.html
How do I do this using bash?
I tried the following code:
for file in *.html *.zip; do
new_name=$(echo "$file" | sed 's/_[^_]*_/_/')
mv "$file" "$new_name"
done
but it did not work the way I want.
Solution
You don't need sed
for this (but see at the end for an explanation why your attempt fails, and for a working sed
command).
With a recent enough bash
(at least 3.0 for [[ string =~ regexp ]]
and BASH_REMATCH
):
for f in *.html *.zip; do
[[ "$f" =~ ^(([^_]*_){2}).+((_[^_]*){2})$ ]] && mv "$f" "${BASH_REMATCH[1]}${BASH_REMATCH[3]}"
done
With an older bash:
for f in *.html *.zip; do
set -f; ifs="$IFS"; IFS=_ a=( $f ); IFS="$ifs"; set +f; n="${#a[@]}"
(( n > 4 )) && mv "$f" "${a[0]}_${a[1]}__${a[n-2]}_${a[n-1]}"
done
Note: set -f; ...; set +f
to temporarily suppress pathname expansion because your file names could contain glob operators (*
, ?
, [...]
).
The reason why your attempt fails is that sed
attempts to match regular expressions from left to right (and is greedy). In sed 's/_[^_]*_/_/'
the leftmost match is thus substituted. If the file name is S_1004_DKDL220006264-1A_HLGLFDSX3_L4_cleaned_1_fastqc.html
, the matched part is _1004_
(leftmost) and the result is S_DKDL220006264-1A_HLGLFDSX3_L4_cleaned_1_fastqc.html
.
If you really want to use sed
for this you could try:
sed 's/^\([^_]*_[^_]*_\).\+\(_[^_]*_[^_]*\)$/\1\2/'
Answered By - Renaud Pacalet Answer Checked By - Marie Seifert (WPSolving Admin)