Calling conventions in linux

Содержание

yamnikov-oleg / calling_conventions.md
What are the calling conventions for UNIX & Linux system calls on i386 and x86-64
Examples
Calling convention variation
Comparing x86-32 and x86-64 bit
Calling conventions in linux
ELF vs a.out problems
Direct Linux syscalls
Hardware I/O under Linux
Accessing 16-bit drivers from Linux/i386

yamnikov-oleg / calling_conventions.md

Source: man syscall

Architecture calling conventions

Every architecture has its own way of invoking and passing arguments to the kernel. The details for various architectures are listed in the two tables below.

The first table lists the instruction used to transition to kernel mode, (which might not be the fastest or best way to transition to the kernel, so you might have to refer to vdso(7) ), the register used to indicate the system call number, and the register used to return the system call result.

arch/ABI	instruction	syscall #	retval	Notes
arm/OABI	swi NR	—	a1	NR is syscall #
arm/EABI	swi 0x0	r7	r0
arm64	svc #0	x8	x0
blackfin	excpt 0x0	P0	R0
i386	int $0x80	eax	eax
ia64	break 0x100000	r15	r8	See below
mips	syscall	v0	v0	See below
parisc	ble 0x100(%sr2, %r0)	r20	r28
s390	svc 0	r1	r2	See below
s390x	svc 0	r1	r2	See below
sparc/32	t 0x10	g1	o0
sparc/64	t 0x6d	g1	o0
x86_64	syscall	rax	rax	See below
x32	syscall	rax	rax	See below

For s390 and s390x, NR (the system call number) may be passed directly with «svc NR» if it is less than 256.

The x32 ABI uses the same instruction as the x86_64 ABI and is used on the same processors. To differentiate between them, the bit mask __X32_SYSCALL_BIT is bitwise-ORed into the system call number for system calls under the x32 ABI.

On a few architectures, a register is used to indicate simple boolean failure of the system call: ia64 uses r10 for this purpose, and mips uses a3.

The second table shows the registers used to pass the system call arguments.

arch/ABI	arg1	arg2	arg3	arg4	arg5	arg6	arg7	Notes
arm/OABI	a1	a2	a3	a4	v1	v2	v3
arm/EABI	r0	r1	r2	r3	r4	r5	r6
arm64	x0	x1	x2	x3	x4	x5	—
blackfin	R0	R1	R2	R3	R4	R5	—
i386	ebx	ecx	edx	esi	edi	ebp	—
ia64	out0	out1	out2	out3	out4	out5	—
mips/o32	a0	a1	a2	a3	—	—	—	See below
mips/n32,64	a0	a1	a2	a3	a4	a5	—
parisc	r26	r25	r24	r23	r22	r21	—
s390	r2	r3	r4	r5	r6	r7	—
s390x	r2	r3	r4	r5	r6	r7	—
sparc/32	o0	o1	o2	o3	o4	o5	—
sparc/64	o0	o1	o2	o3	o4	o5	—
x86_64	rdi	rsi	rdx	r10	r8	r9	—
x32	rdi	rsi	rdx	r10	r8	r9	—

The mips/o32 system call convention passes arguments 5 through 8 on the user stack.

Note that these tables don’t cover the entire calling convention — some architectures may indiscriminately clobber other registers not listed here.

Review: cb4c4e8 on 2 Dec 2015.

32-bit system call numbers and entry vectors

Источник

What are the calling conventions for UNIX & Linux system calls on i386 and x86-64

A system call is the fundamental interface between an application and the Linux kernel. When a Unix/Linux program does a file I/O, network data transfer or invokes some process which directly or indirectly interact with the low level instructions, then system call is involved. Making these calls usually involves using a library called glibc which contains the functions.

Examples

Below is a list of some frequently used system calls and their purpose.

Sr.No	System Call	Purpose
1	chmod	change permissions of a file
2	chdir	change working directory
3	fork	create a child process
4	unlink	delete a name and possibly the file it refers to

A systems programmer writes program that will not directly make the systems call, rather than he will just specify which system call to use. This involves using a calling convention which is dependent or the hardware architecture of the system where the kernel sits. Hence different architectures have different calling conventions.

A calling convention is an implementation-level design for how subroutines receive parameters from their caller and how the results are returned. Differences in various implementations include where parameters, return values, return addresses and scope links are placed (registers, stack or memory etc.), and how the tasks of preparing for a function call and restoring the environment afterward are divided between the caller and the callee.

Calling convention variation

Below is a list of some of the scenarios describing how the Calling convention varies between different architecture

Which registers the called function must preserve for the caller.
How the task of setting up for and cleaning up after a function call is divided between the caller and the callee.
How return value is delivered from the callee back to the caller — on the stack, in a register, or within the heap etc.
Where parameters, return values and return addresses are placed
The order in which actual arguments for formal parameters are passed.

Comparing x86-32 and x86-64 bit

A single CPU architecture always have more than one possible calling convention but the industry has agreed to some general approach across the architectures form different producers. The 32-bit architecture has 32 registers while x64 extends x86’s 8 general-purpose registers to be 64-bit. Hence there is a difference in the implementation of calling conventions. Below is comparison of major calling conventions between these two architectures.

Источник

Calling conventions in linux

This is the preferred way if you are developing mixed C-asm project. Check GCC docs and examples from Linux kernel .S files that go through gas (not those that go through as86 ).

32-bit arguments are pushed down stack in reverse syntactic order (hence accessed/popped in the right order), above the 32-bit near return address. %ebp , %esi , %edi , %ebx are callee-saved, other registers are caller-saved; %eax is to hold the result, or %edx:%eax for 64-bit results.

FP stack: I’m not sure, but I think result is in st(0) , whole stack caller-saved.

Note that GCC has options to modify the calling conventions by reserving registers, having arguments in registers, not assuming the FPU, etc. Check the i386 .info pages.

Beware that you must then declare the cdecl or regparm(0) attribute for a function that will follow standard GCC calling conventions. See C Extensions::Extended Asm:: section from the GCC info pages. See also how Linux defines its asmlinkage macro.

ELF vs a.out problems

Some C compilers prepend an underscore before every symbol, while others do not.

Particularly, Linux a.out GCC does such prepending, while Linux ELF GCC does not.

If you need to cope with both behaviors at once, see how existing packages do. For instance, get an old Linux source tree, the Elk, qthreads, or OCaml.

You can also override the implicit C->asm renaming by inserting statements like

to be sure that the C function foo() will be called really bar in assembly.

Note that the objcopy utility from the binutils package should allow you to transform your a.out objects into ELF objects, and perhaps the contrary too, in some cases. More generally, it will do lots of file format conversions.

Direct Linux syscalls

Here is summary of direct system calls pros and cons.

the smallest possible size; squeezing the last byte out of the system

the highest possible speed; squeezing cycles out of your favorite benchmark

full control: you can adapt your program/library to your specific language or memory requirements or whatever

no pollution by libc cruft

no pollution by C calling conventions (if you’re developing your own language or environment)

static binaries make you independent from libc upgrades or crashes, or from dangling #! path to an interpreter (and are faster)

just for the fun out of it (don’t you get a kick out of assembly programming?)

If any other program on your computer uses the libc, then duplicating the libc code will actually wastes memory, not saves it.

Services redundantly implemented in many static binaries are a waste of memory. But you can make your libc replacement a shared library.

Size is much better saved by having some kind of bytecode, wordcode, or structure interpreter than by writing everything in assembly. (the interpreter itself could be written either in C or assembly.) The best way to keep multiple binaries small is to not have multiple binaries, but instead to have an interpreter process files with #! prefix. This is how OCaml works when used in wordcode mode (as opposed to optimized native code mode), and it is compatible with using the libc. This is also how Tom Christiansen’s Perl PowerTools reimplementation of unix utilities works. Finally, one last way to keep things small, that doesn’t depend on an external file with a hardcoded path, be it library or interpreter, is to have only one binary, and have multiply-named hard or soft links to it: the same binary will provide everything you need in an optimal space, with no redundancy of subroutines or useless binary headers; it will dispatch its specific behavior according to its argv[0] ; in case it isn’t called with a recognized name, it might default to a shell, and be possibly thus also usable as an interpreter!

You cannot benefit from the many functionalities that libc provides besides mere linux syscalls: that is, functionality described in section 3 of the manual pages, as opposed to section 2, such as malloc, threads, locale, password, high-level network management, etc.

Therefore, you might have to reimplement large parts of libc, from printf() to malloc() and gethostbyname . It’s redundant with the libc effort, and can be quite boring sometimes. Note that some people have already reimplemented «light» replacements for parts of the libc — — check them out! (Redhat’s minilibc, Rick Hohensee’s libsys, Felix von Leitner’s dietlibc, asmutils project is working on pure assembly libc)

Static libraries prevent you to benefit from libc upgrades as well as from libc add-ons such as the zlibc package, that does on-the-fly transparent decompression of gzip-compressed files.

The few instructions added by the libc can be a ridiculously small speed overhead as compared to the cost of a system call. If speed is a concern, your main problem is in your usage of system calls, not in their wrapper’s implementation.

Using the standard assembly API for system calls is much slower than using the libc API when running in micro-kernel versions of Linux such as L4Linux, that have their own faster calling convention, and pay high convention-translation overhead when using the standard one (L4Linux comes with libc recompiled with their syscall API; of course, you could recompile your code with their API, too).

See previous discussion for general speed optimization issue.

If syscalls are too slow to you, you might want to hack the kernel sources (in C) instead of staying in userland.

If you’ve pondered the above pros and cons, and still want to use direct syscalls, then here is some advice.

You can easily define your system calling functions in a portable way in C (as opposed to unportable using assembly), by including asm/unistd.h , and using provided macros.

Since you’re trying to replace it, go get the sources for the libc, and grok them. (And if you think you can do better, then send feedback to the authors!)

As an example of pure assembly code that does everything you want, examine Linux assembly resources.

Basically, you issue an int 0x80 , with the __NR_ syscallname number (from asm/unistd.h ) in eax , and parameters (up to six) in ebx , ecx , edx , esi , edi , ebp respectively.

Result is returned in eax , with a negative result being an error, whose opposite is what libc would put into errno . The user-stack is not touched, so you needn’t have a valid one when doing a syscall.

Passing sixth parameter in ebp appeared in Linux 2.4, previous Linux versions understand only 5 parameters in registers.

As for the invocation arguments passed to a process upon startup, the general principle is that the stack originally contains the number of arguments argc , then the list of pointers that constitute *argv , then a null-terminated sequence of null-terminated variable=value strings for the environ ment. For more details, do examine Linux assembly resources, read the sources of C startup code from your libc ( crt0.S or crt1.S ), or those from the Linux kernel ( exec.c and binfmt_*.c in linux/fs/ ).

Hardware I/O under Linux

If you want to perform direct port I/O under Linux, either it’s something very simple that does not need OS arbitration, and you should see the IO-Port-Programming mini-HOWTO; or it needs a kernel device driver, and you should try to learn more about kernel hacking, device driver development, kernel modules, etc, for which there are other excellent HOWTOs and documents from the LDP.

Particularly, if what you want is Graphics programming, then do join one of the GGI or XFree86 projects.

Some people have even done better, writing small and robust XFree86 drivers in an interpreted domain-specific language, GAL, and achieving the efficiency of hand C-written drivers through partial evaluation (drivers not only not in asm, but not even in C!). The problem is that the partial evaluator they used to achieve efficiency is not free software. Any taker for a replacement?

Anyway, in all these cases, you’ll be better when using GCC inline assembly with the macros from linux/asm/*.h than writing full assembly source files.

Accessing 16-bit drivers from Linux/i386

Such thing is theoretically possible (proof: see how DOSEMU can selectively grant hardware port access to programs), and I’ve heard rumors that someone somewhere did actually do it (in the PCI driver? Some VESA access stuff? ISA PnP? dunno). If you have some more precise information on that, you’ll be most welcome. Anyway, good places to look for more information are the Linux kernel sources, DOSEMU sources, and sources for various low-level programs under Linux. (perhaps GGI if it supports VESA).

Basically, you must either use 16-bit protected mode or vm86 mode.

The first is simpler to setup, but only works with well-behaved code that won’t do any kind of segment arithmetics or absolute segment addressing (particularly addressing segment 0), unless by chance it happens that all segments used can be setup in advance in the LDT.

The later allows for more «compatibility» with vanilla 16-bit environments, but requires more complicated handling.

In both cases, before you can jump to 16-bit code, you must

mmap any absolute address used in the 16-bit code (such as ROM, video buffers, DMA targets, and memory-mapped I/O) from /dev/mem to your process’ address space,

setup the LDT and/or vm86 mode monitor.

grab proper I/O permissions from the kernel (see the above section)

Again, carefully read the source for the stuff contributed to the DOSEMU project, particularly these mini-emulators for running ELKS and/or simple .COM programs under Linux/i386.

Источник