Issue
First of all: I know that there are a lot of web pages (including discussions on stackoverflow) where the differences between .bss
and .data
for the data declaration are discussed, but I have a specific question and unfortunately I did not find the answer on those pages, so I ask it here :-).
I am a big beginner in assembly, so I apologize if the question is stupid :-).
I am learning assembly on an x86 64-bit linux OS, but I think that my question is more general and probably not specific to the OS / the architecture.
I find the definition of the .bss
and .data
sections a bit strange. I can always declare a variable in .bss
and then move a value in this variable from my code (.text
section), right? So why should I declare a variable in the .data
section, when I know that variables declared in this section will add to the size of my executable file?
I could ask this question in the context of C programming as well: why should I initialize my variable when I declare it, if it is more efficient to declare it uninitialized and then assign a value to it in the beginning of my code?
I suppose that my approach of memory management is naive and not correct, but I do not understand why.
Solution
.bss
is where you put zero-initialized static data, like C int x;
(at global scope). That's the same as int x = 0;
for static / global (static storage class)1.
.data
is where you put non-zero-initialized static data, like int x = 2;
If you put that in BSS, you'd need a runtime static "constructor" to initalize the BSS location. Like what a C++ compiler would do for static const int prog_starttime = __rdtsc();
. (Even though it's const, the initializer isn't a compile-time constant so it can't go in .rodata
)
.bss
with a runtime initializer would make sense for big arrays that are mostly zero or filled with the same value (memset / rep stosd
), but in practice writing char buf[1024000] = {1};
will put 1MB of almost all zeros into .data
, with current compilers.
Otherwise it is not more efficient. A mov dword [myvar], imm32
instruction is 10 bytes long, costing over twice as many bytes in your executable as if it were statically initialized in .data
. Also, the initializer code has to be executed as well as loaded, and the 4 bytes of BSS space also take RAM.
By contrast, section .rodata
(or .rdata
on Windows) is where compilers put string literals, FP constants, and static const int x = 123;
(Actually, x
would normally get inlined as an immediate everywhere it's used in the compilation unit, letting the compiler optimize away any static storage. But if you took its address and passed &x
to a function, the compiler would need it to exist in memory somewhere, and that would be in .rodata
)
Footnote 1: Inside a function, int x;
would be on the stack if the compiler didn't optimize it away or into registers, when compiling for a normal register machine with a stack like x86.
I could ask this question in the context of C programming as well
In C, an optimizing compiler will treat int x; x=5;
pretty much identically to int x=5;
inside a function. No static storage is involved. Looking at actual compiler output is often instructive: see How to remove "noise" from GCC/clang assembly output?.
Outside a function, at global scope, you can't write things like x=5;
. You could do that at the top of main
, and then you would trick the compiler into making worse code.
Inside a function with static int x = 5;
, the initialization happens once. (At compile time). If you did static int x; x=5;
the static storage would be re-initialized every time the function was entered, and you might as well have not used static
unless you have other reasons for needing static storage class. (e.g. returning a pointer to x
that's still valid after the function returns.)
Answered By - Peter Cordes Answer Checked By - Clifford M. (WPSolving Volunteer)