Sunday, November 20, 2022

[SOLVED] How expensive is a bash function call really?

November 20, 2022 bash

Issue

Just out if interest, are there any sources on how expensive function calls in Bash really are? I expect them to be several times slower than executing the code within them directly but I can't seem to find anything about this.

Solution

I don't really agree that performance should not be a worry when programming in bash. It's actually a very good question to ask.

Here's a possible benchmark, comparing the builtin true and the command true, the full path of which is /bin/true on my machine.

On my machine:

$ time for i in {0..1000}; do true; done

real    0m0.004s
user    0m0.004s
sys 0m0.000s
$ time for i in {0..1000}; do /bin/true; done

real    0m2.660s
user    0m2.880s
sys 0m2.344s

Amazing! That's about 2 to 3 ms wasted by just forking a process (on my machine)!

So next time you have some large text file to process, you'll avoid the mistaken long chains of piped cats, greps, awks, cuts, trs, seds, heads, tails, you-name-its. Besides, UNIX pipes and also very slow (will that be your next question?).

Imagine you have a 1000 line file, and in each line you put one cat then a grep then a sed and then an awk (no, don't laugh, you can see even worse by going through the posts on this site!), then you're already wasting (on my machine) at least 241000=8000ms=8s just forking stupid and useless processes.

To answer your comment about pipes...

###Subshells

Subshells are very slow:

$ time for i in {1..1000}; do (true); done

real    0m2.465s
user    0m2.812s
sys 0m2.140s

Amazing! over 2ms per subshell (on my machine).

###Pipes

Pipes are also very slow (this should be obvious regarding the fact that they involve subshells):

$ time for i in {1..1000}; do true | true; done

real    0m4.769s
user    0m5.652s
sys 0m4.240s

Amazing! over 4ms per pipe (on my machine), so that's 2ms for just the pipe after subtracting the time for the subshell.

Redirection

$ time for i in {1..1000}; do true > file; done

real    0m0.014s
user    0m0.008s
sys 0m0.008s

So that's pretty fast.

Ok, you probably also want to see it in action with creation of a file:

$ rm file*; time for i in {1..1000}; do true > file$i; done

real    0m0.030s
user    0m0.008s
sys 0m0.016s

Still decently fast.

Pipes vs redirections:

In your comment, you mention:

sed '' filein > filetmp; sed '' filetmp > fileout

sed '' filein | sed '' > fileout

(Of course, the best thing would be to use a single instance of sed (it's usually possible), but that doesn't answer the question.)

Let's check that out:

A funny way:

$ rm file*
$ > file
$ time for i in {1..1000}; do sed '' file | sed '' > file$i; done

real    0m5.842s
user    0m4.752s
sys 0m5.388s
$ rm file*
$ > file
$ time for i in {1..1000}; do sed '' file > filetmp$i; sed '' filetmp$i > file$i; done

real    0m6.723s
user    0m4.812s
sys 0m5.800s

So it seems faster to use a pipe rather than using a temporary file (for sed). In fact, this could have been understood without typing the lines: in a pipe, as soon as the first sed spits out something, the second sed starts processing the data. In the second case, the first sed does its job, and then the second sed does its job.

So our experiment is not a good way of determining if pipes are better that redirections.

How about process substitutions?

$ rm file*
$ > file
$ time for i in {1..1000}; do sed '' > file$i < <(sed '' file); done

real    0m7.899s
user    0m1.572s
sys 0m3.712s

Wow, that's slow! Hey, but observe the user and system CPU usage: much less than the other two possibilities (if someone can explain that...)

Answered By - gniourf_gniourf

Answer Checked By - Willingham (WPSolving Volunteer)

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0