Issue
I am attempting to replace text data in a git repository using the git filter-branch functionality.
I wrote a simple script to search for various terms and replace them. It was running extremely slow. I had multiple lines of BASH code executing to customize my search results and replacement operation. I know my code was not very efficient. I decided to go ahead and try just my first line which should be semi-efficient. It's still taking forever to walk through the code base.
Is it possible to use BASH or another simple approach to search through my files and executed Find & Replace operations in parallel to speed things up?
If not, are there any other suggestions on how to go about handling this better?
Here's the Git command I'm executing:
git filter-branch --tree-filter "sh /home/kurtis/.bin/redact.sh || true" \
-- --all
Here's the code my command is essentially executing:
find . -not -name "*.sql" -not -name "*.tsv" -not -name "*.class" \
-type f -exec sed -i 's/01dPassw0rd\!/HIDDENPASSWORD/g' {} \;
Solution
git filter-branch cannot process commits in parallel, becouse it needs to know hash (id) of parent commit to calculate current hash.
But you can speed up processing of each commit:
Your code executes sed for each file. That is very slow. Use this instead:
find . -not -name "*.sql" -not -name "*.tsv" -not -name "*.class" \
-type f -print0 \
| xargs -0 sed -i 's/01dPassw0rd\!/HIDDENPASSWORD/g'
This version does exactly the same as yours, but sed is executed with as many files (arguments) as possible. Find's "-print0" and xargs's "-0" means "separe filenames with zero byte". So there is no trouble when filename contains spaces, new lines, binary trash, etc.
Answered By - Josef Kufner Answer Checked By - Mary Flores (WPSolving Volunteer)