Linux x64 call convention

Содержание

Calling Conventions
Contents
Basics
Cheat Sheets
What are the calling conventions for UNIX & Linux system calls (and user-space functions) on i386 and x86-64
4 Answers 4
Kernel Interface
x86-64 Linux System Call convention:
User Interface: function calling
x86-64 System V user-space Function Calling convention:

Calling Conventions

Calling external functions in C, and calling C functions from other languages, is a common issue in OS programming, especially where the other language is assembly. This page will concentrate primarily on the latter case, but some consideration is made for other languages as well.

Some of what is described here is imposed by the x86 architecture, some is special to the GNU GCC toolchain. Some is configurable, and you could be making your own GCC target to support a different calling convention. Currently, this page makes no effort of differentiating which is what.

Basics

As a general rule, a function which follows the C calling conventions, and is appropriately declared (see below) in the C headers, can be called as a normal C function. Most of the burden for following the calling rules falls upon the assembly program.

Cheat Sheets

Here is a quick overview of common calling conventions. Note that the calling conventions are usually more complex than represented here (for instance, how is a large struct returned? How about a struct that fits in two registers? How about va_list’s?). Look up the specifications if you want to be certain. It may be useful to write a test function and use gcc -S to see how the compiler generates code, which may give a hint of how the calling convention specification should be interpreted.

External References

In order to call a foreign function from C, it must have a correct C prototype. Thus, is if the function fee() takes the arguments fie, foe, and fum, in C calling order, and returns an integer value, then the corresponding header file should have the following prototype:

Similarly, an global variables in the assembly code must be declared extern:

C functions in assembly or other languages must be declared as appropriate for the language. For example, in NASM, the C function

would be declared

Also, in most assembly languages, a function or variable that it to be exported must be declared global:

Name Mangling

In some object formats (a.out), the name of a C function is automagically mangled by prepending it with an underscore («_»). Thus, to call a C function foo() in assembly with such a format, you must define it as extern _foo instead of extern foo. This requirement does not apply to most modern formats such as COFF, PE, and ELF.

C++ name mangling is much more severe, as the C++ compiler encodes the type information from the parameter list into the symbol. (This is what enables function overloading in C++ in the first place.) The Binutils package contains the tool c++filt that can be used to determine the correct mangled name.

Registers

The general register EBX, ESI, EDI, EBP, DS, ES, and SS, must be preserved by the called function. If you use them, you must save them first and restore them afterwards. Conversely, EAX and EDX are used for return values, and thus should not be preserved. The other registers do not need to be saved by the called function, but if they are in use by the calling function, then the calling function should save them before the call is made, and restored afterwards.

Passing Function Arguments

GCC/x86 passes function arguments on the stack. These arguments are pushed in reverse order from their order in the argument list. Furthermore, since the x86 protected-mode stack operations operate on 32-bit values, the values are always pushed as a 32-bit value, even if the actual value is less than a full 32-bit value. Thus, for function foo(), the value of quux (a 48-bit FP value) is pushed first as two 32-bit values, low-32-bit-value first; the value of baz is pushed as the first byte of in 32-bit value; and then finally bar is pushed as a 32-bit value.

To pass arguments to a C function, the calling function must push the argument values as described above. Thus, to call foo() from a NASM assembly program, you would do something like this

Accessing Function Arguments

In the GCC/x86 C calling convention, the first thing any function that accepts formal arguments should do is push the value of EBP (the frame base pointer of the calling function), then copy the value of ESP to EBP. This sets the function’s own frame pointer, which is used to track both the arguments and (in C, or in any properly reentrant assembly code) the local variables.

To access arguments passed by a C function, you need to use the EBP an offset equal to 4 * (n + 2), where n is the number of the parameter in the argument list (not the number in the order it was pushed by), zero-indexed. The + 2 is an added offset for the calling function’s saved frame pointer and return pointer (pushed automatically by CALL, and popped by RET).

Thus, in function fee, to move fie into EAX, foe into BL, and fum into EAX and EDX, you would write (in NASM):

As stated earlier, return values in GCC are passed using EAX and EDX. If a value exceeds 64 bits, it must be passed as a pointer.

Источник

What are the calling conventions for UNIX & Linux system calls (and user-space functions) on i386 and x86-64

Following links explain x86-32 system call conventions for both UNIX (BSD flavor) & Linux:

But what are the x86-64 system call conventions on both UNIX & Linux?

4 Answers 4

I verified these using GNU Assembler (gas) on Linux.

Kernel Interface

x86-32 aka i386 Linux System Call convention:

In x86-32 parameters for Linux system call are passed using registers. %eax for syscall_number. %ebx, %ecx, %edx, %esi, %edi, %ebp are used for passing 6 parameters to system calls.

The return value is in %eax . All other registers (including EFLAGS) are preserved across the int $0x80 .

I took following snippet from the Linux Assembly Tutorial but I’m doubtful about this. If any one can show an example, it would be great.

If there are more than six arguments, %ebx must contain the memory location where the list of arguments is stored — but don’t worry about this because it’s unlikely that you’ll use a syscall with more than six arguments.

For an example and a little more reading, refer to http://www.int80h.org/bsdasm/#alternate-calling-convention. Another example of a Hello World for i386 Linux using int 0x80 : Hello, world in assembly language with Linux system calls?

There is a faster way to make 32-bit system calls: using sysenter . The kernel maps a page of memory into every process (the vDSO), with the user-space side of the sysenter dance, which has to cooperate with the kernel for it to be able to find the return address. Arg to register mapping is the same as for int $0x80 . You should normally call into the vDSO instead of using sysenter directly. (See The Definitive Guide to Linux System Calls for info on linking and calling into the vDSO, and for more info on sysenter , and everything else to do with system calls.)

x86-32 [Free|Open|Net|DragonFly]BSD UNIX System Call convention:

Parameters are passed on the stack. Push the parameters (last parameter pushed first) on to the stack. Then push an additional 32-bit of dummy data (Its not actually dummy data. refer to following link for more info) and then give a system call instruction int $0x80

x86-64 Linux System Call convention:

(Note: x86-64 Mac OS X is similar but different from Linux. TODO: check what *BSD does)

Refer to section: «A.2 AMD64 Linux Kernel Conventions» of System V Application Binary Interface AMD64 Architecture Processor Supplement. The latest versions of the i386 and x86-64 System V psABIs can be found linked from this page in the ABI maintainer’s repo. (See also the x86 tag wiki for up-to-date ABI links and lots of other good stuff about x86 asm.)

Here is the snippet from this section:

User-level applications use as integer registers for passing the sequence %rdi, %rsi, %rdx, %rcx, %r8 and %r9. The kernel interface uses %rdi, %rsi, %rdx, %r10, %r8 and %r9.

A system-call is done via the syscall instruction. This clobbers %rcx and %r11 as well as the %rax return value, but other registers are preserved.

The number of the syscall has to be passed in register %rax.

System-calls are limited to six arguments, no argument is passed directly on the stack.

Returning from the syscall, register %rax contains the result of the system-call. A value in the range between -4095 and -1 indicates an error, it is -errno .

Only values of class INTEGER or class MEMORY are passed to the kernel.

Remember this is from the Linux-specific appendix to the ABI, and even for Linux it’s informative not normative. (But it is in fact accurate.)

This 32-bit int $0x80 ABI is usable in 64-bit code (but highly not recommended). What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code? It still truncates its inputs to 32-bit, so it’s unsuitable for pointers, and it zeros r8-r11.

User Interface: function calling

x86-32 Function Calling convention:

In x86-32 parameters were passed on stack. Last parameter was pushed first on to the stack until all parameters are done and then call instruction was executed. This is used for calling C library (libc) functions on Linux from assembly.

Modern versions of the i386 System V ABI (used on Linux) require 16-byte alignment of %esp before a call , like the x86-64 System V ABI has always required. Callees are allowed to assume that and use SSE 16-byte loads/stores that fault on unaligned. But historically, Linux only required 4-byte stack alignment, so it took extra work to reserve naturally-aligned space even for an 8-byte double or something.

Some other modern 32-bit systems still don’t require more than 4 byte stack alignment.

x86-64 System V user-space Function Calling convention:

x86-64 System V passes args in registers, which is more efficient than i386 System V’s stack args convention. It avoids the latency and extra instructions of storing args to memory (cache) and then loading them back again in the callee. This works well because there are more registers available, and is better for modern high-performance CPUs where latency and out-of-order execution matter. (The i386 ABI is very old).

In this new mechanism: First the parameters are divided into classes. The class of each parameter determines the manner in which it is passed to the called function.

For complete information refer to : «3.2 Function Calling Sequence» of System V Application Binary Interface AMD64 Architecture Processor Supplement which reads, in part:

Once arguments are classified, the registers get assigned (in left-to-right order) for passing as follows:

If the class is MEMORY, pass the argument on the stack.
If the class is INTEGER, the next available register of the sequence %rdi, %rsi, %rdx, %rcx, %r8 and %r9 is used

So %rdi, %rsi, %rdx, %rcx, %r8 and %r9 are the registers in order used to pass integer/pointer (i.e. INTEGER class) parameters to any libc function from assembly. %rdi is used for the first INTEGER parameter. %rsi for 2nd, %rdx for 3rd and so on. Then call instruction should be given. The stack ( %rsp ) must be 16B-aligned when call executes.

If there are more than 6 INTEGER parameters, the 7th INTEGER parameter and later are passed on the stack. (Caller pops, same as x86-32.)

The first 8 floating point args are passed in %xmm0-7, later on the stack. There are no call-preserved vector registers. (A function with a mix of FP and integer arguments can have more than 8 total register arguments.)

Variadic functions (like printf ) always need %al = the number of FP register args.

There are rules for when to pack structs into registers ( rdx:rax on return) vs. in memory. See the ABI for details, and check compiler output to make sure your code agrees with compilers about how something should be passed/returned.

Note that the Windows x64 function calling convention has multiple significant differences from x86-64 System V, like shadow space that must be reserved by the caller (instead of a red-zone), and call-preserved xmm6-xmm15. And very different rules for which arg goes in which register.