Issue
Just out if interest, are there any sources on how expensive function calls in Bash really are? I expect them to be several times slower than executing the code within them directly but I can't seem to find anything about this.
Solution
I don't really agree that performance should not be a worry when programming in bash. It's actually a very good question to ask.
Here's a possible benchmark, comparing the builtin true
and the command true
, the full path of which is /bin/true
on my machine.
On my machine:
$ time for i in {0..1000}; do true; done
real 0m0.004s
user 0m0.004s
sys 0m0.000s
$ time for i in {0..1000}; do /bin/true; done
real 0m2.660s
user 0m2.880s
sys 0m2.344s
Amazing! That's about 2 to 3 ms wasted by just forking a process (on my machine)!
So next time you have some large text file to process, you'll avoid the mistaken long chains of piped cat
s, grep
s, awk
s, cut
s, tr
s, sed
s, head
s, tail
s, you-name-it
s. Besides, UNIX pipes and also very slow (will that be your next question?).
Imagine you have a 1000 line file, and in each line you put one cat
then a grep
then a sed
and then an awk
(no, don't laugh, you can see even worse by going through the posts on this site!), then you're already wasting (on my machine) at least 241000=8000ms=8s just forking stupid and useless processes.
To answer your comment about pipes...
###Subshells
Subshells are very slow:
$ time for i in {1..1000}; do (true); done
real 0m2.465s
user 0m2.812s
sys 0m2.140s
Amazing! over 2ms per subshell (on my machine).
###Pipes
Pipes are also very slow (this should be obvious regarding the fact that they involve subshells):
$ time for i in {1..1000}; do true | true; done
real 0m4.769s
user 0m5.652s
sys 0m4.240s
Amazing! over 4ms per pipe (on my machine), so that's 2ms for just the pipe after subtracting the time for the subshell.
Redirection
$ time for i in {1..1000}; do true > file; done
real 0m0.014s
user 0m0.008s
sys 0m0.008s
So that's pretty fast.
Ok, you probably also want to see it in action with creation of a file:
$ rm file*; time for i in {1..1000}; do true > file$i; done
real 0m0.030s
user 0m0.008s
sys 0m0.016s
Still decently fast.
Pipes vs redirections:
In your comment, you mention:
sed '' filein > filetmp; sed '' filetmp > fileout
vs
sed '' filein | sed '' > fileout
(Of course, the best thing would be to use a single instance of sed
(it's usually possible), but that doesn't answer the question.)
Let's check that out:
A funny way:
$ rm file*
$ > file
$ time for i in {1..1000}; do sed '' file | sed '' > file$i; done
real 0m5.842s
user 0m4.752s
sys 0m5.388s
$ rm file*
$ > file
$ time for i in {1..1000}; do sed '' file > filetmp$i; sed '' filetmp$i > file$i; done
real 0m6.723s
user 0m4.812s
sys 0m5.800s
So it seems faster to use a pipe rather than using a temporary file (for sed). In fact, this could have been understood without typing the lines: in a pipe, as soon as the first sed
spits out something, the second sed
starts processing the data. In the second case, the first sed
does its job, and then the second sed
does its job.
So our experiment is not a good way of determining if pipes are better that redirections.
How about process substitutions?
$ rm file*
$ > file
$ time for i in {1..1000}; do sed '' > file$i < <(sed '' file); done
real 0m7.899s
user 0m1.572s
sys 0m3.712s
Wow, that's slow! Hey, but observe the user and system CPU usage: much less than the other two possibilities (if someone can explain that...)
Answered By - gniourf_gniourf Answer Checked By - Willingham (WPSolving Volunteer)