Issue
I was using Compiler Explorer and noticed GCC and Clang would emit seemingly unnecessary instructions related to the stack when compiling this simple function (href="https://godbolt.org/z/ex41MYjEr" rel="nofollow noreferrer">Compiler Explorer).
void bar(void);
int foo(void) {
bar();
return 42;
}
Here is the result of the compilation (also visible in Compiler Explorer via the link above). -mabi=sysv
has no effect on the output assembly, but I wanted to rule out the ABI as the cause of the strange assembly.
// Expected output:
foo:
call bar
mov eax, 42
ret
// gcc -O3 -mabi=sysv
// Why is it reserving unused space in the stack frame?
foo:
sub rsp, 8
call bar
mov eax, 42
add rsp, 8
ret
// clang -O3 -mabi=sysv
// Why is it preserving a scratch register then moving it to another unused scratch register?
foo:
push rax
call bar@PLT
mov eax, 42
pop rcx
ret
Why is the stack frame modified despite the function not using stack?
I found this particularly strange since this seems like a particularly easy optimization for major compilers like GCC and Clang to perform when working with a known ABI.
I have a couple theories, but I was hoping to get some clarification.
- Maybe this is done to prevent an infinite loop in the event that
bar
callsfoo
recursively? By consuming a small amount of stack space on each call we ensure that the program eventually segfaults when it runs out of stack space. Maybe clang is doing the same thing, but it usespush
andpop
to allow for better pipelining in some situations? If this is the case, are there any CLI arguments I can use to disable this behavior? However, this seems like a non-issue sincecall
pushesrip
to the stack anyway on x86-64. - Maybe there is some quirk of C or the AMD64 System V ABI that I am unaware of?
- Perhaps I was overthinking this and the strange assembly is simply the result of poor register/stack optimization. Maybe at some point in the compilation process the stack was used, but after the usages were optimized away it was unable to remove the value on the stack.
Solution
Alignment.
The call
instruction pushes 8 bytes onto the stack (the return address). So the optimized functions adjust by another 8 bytes to ensure the stack pointer is 16-byte aligned.
I believe this is a requirement of the ABI to ensure that 128-bit SSE register values can be spilled to naturally-aligned addresses, which is important to avoid a performance hit or fault, depending on CPU configuration. And/or so that SSE instructions can be used for optimized block moves from appropriate addresses.
The clang and gcc case are effectively identical - you don't really care what was written to that stack slot, or which volatile register was updated, only that the stack pointer was adjusted.
Answered By - Jonathon Reinhart Answer Checked By - Willingham (WPSolving Volunteer)