Saturday, December 2, 2023

[SOLVED] Is there an inline assembly constraint for 32-bit immediate for x86-64

December 02, 2023 gcc, immediate-operand, inline-assembly, x86-64

Issue

Is there a constraint for x86-64 that is similar to the "i" constraint, but that only matches when the operand value fits in a 32-bit signed immediate?

For the function shown below, I would like gcc to use lock add mem, imm when the operand fits into a 32-bit signed immediate, and I would like it to use the "r" constraint and generate mov r, imm; lock add mem, r when the immediate doesn't fit.

The code as shown works correctly when v is a non-constant value or a constant that fits in a signed 32-bit immediate, but gcc generates an invalid instruction when used with a constant value that doesn't fit in a signed 32-bit immediate operand.*

static inline void atomic_add(volatile unsigned long *m, unsigned long v) 
{
    asm volatile ("lock addq %1, %0" : "+m"(*m) : "ri"(v));
}

I tried using "n" instead of "i" in the constraint, but it seems to work the same as "i". Removing the "i" constraint works in all cases, but it moves the immediate into a register even when it isn't necessary. Since the vast majority of uses have a constant that fits in 8 or 32 bits, I would rather not use that solution.

Here is an example demonstrating the problem: https://godbolt.org/z/nPY46Kfdh

extern unsigned long  x;

unsigned long m(volatile unsigned long *v)
{
    atomic_add(v, 12ul);
    atomic_add(v, 12345ul);
    atomic_add(v, 123456789000ul);
    atomic_add(v, x);
    return *v;
}

* There are plenty of answers here explaining why a 64-bit immediate isn't allowed in the add instruction, so there is no need for yet another explanation why it isn't supported.

Solution

Turning the comment into an answer...

Looking at the machine constraints for x86 family (scroll way, way down), we see:

e    32-bit signed integer constant, or a symbolic reference known to fit that range (for immediate operands in sign-extending x86-64 instructions).

That seems to do what you're looking for.

Also, it appears that both Peter and I belong to the school of thought that says don't use inline asm. So whenever possible, I recommend using intrinsics rather than asm blocks. In this case that's probably __atomic_fetch_add(m, v, __ATOMIC_SEQ_CST).

I get that you may not want to take the time right now to rework the entire project to use the newer atomic functions, but it might make sense to start the migration with this one. Especially if there's a wrapper where you can just drop in the new code.

One last thought: I notice you're not using a memory clobber with the asm. Depending on how you're using this routine, it's possible you're introducing a timing bug here. You might want to give it a quick check to be sure it's doing what you intend.

Answered By - David Wohlferd

Answer Checked By - Marilyn (WPSolving Volunteer)

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Saturday, December 2, 2023

[SOLVED] Is there an inline assembly constraint for 32-bit immediate for x86-64

Issue

Solution

Popular Posts

Labels