Issue
When watching the creation of process when tapping ‘ls‘ in terminal, set breakpoint at copy_thread of arch/x86/kernel/process.c with gdb, then print values of pt_regs.
{bx = 0x1200011, cx = 0x0, dx = 0x0, si = 0x0, di = 0xa0f38e8, bp = 0x8266000,
ax = 0xffffffda, ds = 0x7b, __dsh = 0x0, es = 0x7b, __esh = 0x0, fs = 0x0, __fsh = 0x0,
gs = 0x33, __gsh = 0x0, orig_ax = 0x78, ip = 0xb7f29549, cs = 0x73, __csh = 0x0, flags = 0x206,
sp = 0xbfab35f0, ss = 0x7b, __ssh = 0x0}
the bp of pt_regs is 0x8266000, sp of pt_regs is 0xbfab35f0. I have find the place where they are assiged. the sp of pt_regs is assigned in do_SYSENTER_32 of arch/x86/entry/common.c
__visible noinstr long do_SYSENTER_32(struct pt_regs *regs)
{
/* SYSENTER loses RSP, but the vDSO saved it in RBP. */
regs->sp = regs->bp;
/* SYSENTER clobbers EFLAGS.IF. Assume it was set in usermode. */
regs->flags |= X86_EFLAGS_IF;
return do_fast_syscall_32(regs);
}
the bp of pt_regs is assigned in __do_fast_syscall_32 by get_user. It seems from userspace value.
static noinstr bool __do_fast_syscall_32(struct pt_regs *regs)
{
// do other stuff...
/* Fetch EBP from where the vDSO stashed it. */
if (IS_ENABLED(CONFIG_X86_64)) {
/*
* Micro-optimization: the pointer we're following is
* explicitly 32 bits, so it can't be out of range.
*/
res = __get_user(*(u32 *)®s->bp,
(u32 __user __force *)(unsigned long)(u32)regs->sp);
} else {
res = get_user(*(u32 *)®s->bp,
(u32 __user __force *)(unsigned long)(u32)regs->sp);
}
// do other stuff...
return true;
}
and the stack shows the order of functions.
#0 copy_thread (clone_flags=clone_flags@entry=18874368, sp=0, arg=0, p=0xc31c0a00, tls=0)
at arch/x86/kernel/process.c:133
#1 0xc1058722 in copy_process (pid=pid@entry=0x0, trace=trace@entry=0, node=node@entry=-1,
args=<optimized out>) at kernel/fork.c:2122
#2 0xc10593cc in kernel_clone (args=args@entry=0xc68e9f38) at kernel/fork.c:2500
#3 0xc1059807 in __do_sys_clone (child_tidptr=0xa0f38e8, tls=0, parent_tidptr=0x0, newsp=0,
clone_flags=<optimized out>) at kernel/fork.c:2617
#4 __se_sys_clone (child_tidptr=168769768, tls=0, parent_tidptr=0, newsp=0,
clone_flags=<optimized out>) at kernel/fork.c:2585
#5 __ia32_sys_clone (regs=<optimized out>) at kernel/fork.c:2585
#6 0xc1b04b85 in do_syscall_32_irqs_on (nr=<optimized out>, regs=0xc68e9fb4)
at arch/x86/entry/common.c:77
#7 __do_fast_syscall_32 (regs=regs@entry=0xc68e9fb4) at arch/x86/entry/common.c:140
#8 0xc1b04c29 in do_fast_syscall_32 (regs=0xc68e9fb4) at arch/x86/entry/common.c:165
#9 0xc1b04c75 in do_SYSENTER_32 (regs=<optimized out>) at arch/x86/entry/common.c:208
#10 0xc1b0e32f in entry_SYSENTER_32 () at arch/x86/entry/entry_32.S:952
#11 0x01200011 in ?? ()
#12 0x00000000 in ?? ()
I have doubt below:
- Why do the ebp and esp stored in pt_regs differ so greatly?
- Why is the value of ebp stored in pt_regs smaller than the value of
esp stored in pt_regs, since the stack grows downward?
I used the debuggable linux-5.12.10,and the command 'ls' is compiled from busybox.
Solution
Consider the difference in register and stack usage for the legacy INT $0x80 system call mechanism and the modern fast system call mechanism for IA32:
Register / stack | Legacy system call | Fast system call |
---|---|---|
eax | system call number | system call number |
ebx | arg1 | arg1 |
ecx | arg2 | arg2 |
edx | arg3 | arg3 |
esi | arg4 | arg4 |
edi | arg5 | arg5 |
ebp | arg6 | user stack pointer |
arg on user stack | arg6 |
For the fast system call mechanism, when entry_SYSENTER_32
constructs the struct pt_regs
entry on the kernel stack, the sp
member will point to the kernel stack and the bp
member will point to the user stack. Therefore, the fast system call mechanism fixes up the sp
and bp
members for compatibility with the legacy system call mechanism. The sp
member value is corrected in do_SYSENTER_32()
:
/* SYSENTER loses RSP, but the vDSO saved it in RBP. */
regs->sp = regs->bp;
The bp
member value is corrected in __do_fast_syscall_32()
, setting it to the arg6 value from the user stack:
/* Fetch EBP from where the vDSO stashed it. */
if (IS_ENABLED(CONFIG_X86_64)) {
/*
* Micro-optimization: the pointer we're following is
* explicitly 32 bits, so it can't be out of range.
*/
res = __get_user(*(u32 *)®s->bp,
(u32 __user __force *)(unsigned long)(u32)regs->sp);
} else {
res = get_user(*(u32 *)®s->bp,
(u32 __user __force *)(unsigned long)(u32)regs->sp);
}
When do_syscall_32_irqs_on()
is called from do_int80_syscall_32()
(for the legacy system call mechanism) or from __do_fast_syscall_32()
(for the fast system call mechanism), the regs->bp
and regs->sp
values will be as expected no matter which of the system call mechanisms was used.
Another fix-up for fast system calls occurs for regs->ip
. The original value of the EIP register is lost by the sysenter
instruction, which is normally executed from the __kernel_vsyscall()
function in the vDSO. regs->ip
is corrected in do_fast_syscall_32()
:
/*
* Called using the internal vDSO SYSENTER/SYSCALL32 calling
* convention. Adjust regs so it looks like we entered using int80.
*/
unsigned long landing_pad = (unsigned long)current->mm->context.vdso +
vdso_image_32.sym_int80_landing_pad;
/*
* SYSENTER loses EIP, and even SYSCALL32 needs us to skip forward
* so that 'regs->ip -= 2' lands back on an int $0x80 instruction.
* Fix it up.
*/
regs->ip = landing_pad;
The vDSO contains an int $0x80
instruction immediately after the sysenter
instruction. The landing_pad
value is the address just after that int $0x80
instruction, so that instruction will not be reached when returning from the fast system call.
The reason for the int $0x80
instruction in the vDSO is to support older CPUs that lack the sysenter
and sysexit
instructions. In that case, the mov %esp, %ebp; sysenter
instruction sequence in __kernel_vsyscall()
in the vDSO will be replaced with nop
instructions and the CPU will reach the int $0x80
instruction that immediately follows that instruction sequence, effectively changing the fast system call into a legacy system call for older CPUs. That legacy system call will return to the point just after the int $0x80
instruction just like the fast system call.
Answered By - Ian Abbott Answer Checked By - Robin (WPSolving Admin)