- 5. Kernel level exception handlingВ¶
- Understanding the Linux Kernel, 3rd Edition by Daniel P. Bovet, Marco Cesati
- Exception Handling
- Saving the Registers for the Exception Handler
- Entering and Leaving the Exception Handler
- Implement managed exception handling on Linux and Mac #3946
- Comments
- sergiy-k commented Feb 11, 2015
- kangaroo commented Feb 21, 2015
- janvorli commented Feb 21, 2015
- Managed exception handling on Linux
- Exceptions thrown by the managed code
- Exceptions thrown by the runtime methods called from the managed code
- Exceptions thrown by PInvoked functions
- Exceptions thrown by managed code invoked from runtime
- Hardware exceptions
5. Kernel level exception handlingВ¶
When a process runs in kernel mode, it often has to access user mode memory whose address has been passed by an untrusted program. To protect itself the kernel has to verify this address.
In older versions of Linux this was done with the int verify_area(int type, const void * addr, unsigned long size) function (which has since been replaced by access_ok() ).
This function verified that the memory area starting at address вЂaddr’ and of size вЂsize’ was accessible for the operation specified in type (read or write). To do this, verify_read had to look up the virtual memory area (vma) that contained the address addr. In the normal case (correctly working program), this test was successful. It only failed for a few buggy programs. In some kernel profiling tests, this normally unneeded verification used up a considerable amount of time.
To overcome this situation, Linus decided to let the virtual memory hardware present in every Linux-capable CPU handle this test.
How does this work?
Whenever the kernel tries to access an address that is currently not accessible, the CPU generates a page fault exception and calls the page fault handler:
in arch/x86/mm/fault.c. The parameters on the stack are set up by the low level assembly glue in arch/x86/entry/entry_32.S. The parameter regs is a pointer to the saved registers on the stack, error_code contains a reason code for the exception.
do_page_fault first obtains the unaccessible address from the CPU control register CR2. If the address is within the virtual address space of the process, the fault probably occurred, because the page was not swapped in, write protected or something similar. However, we are interested in the other case: the address is not valid, there is no vma that contains this address. In this case, the kernel jumps to the bad_area label.
There it uses the address of the instruction that caused the exception (i.e. regs->eip) to find an address where the execution can continue (fixup). If this search is successful, the fault handler modifies the return address (again regs->eip) and returns. The execution will continue at the address in fixup.
Where does fixup point to?
Since we jump to the contents of fixup, fixup obviously points to executable code. This code is hidden inside the user access macros. I have picked the get_user macro defined in arch/x86/include/asm/uaccess.h as an example. The definition is somewhat hard to follow, so let’s peek at the code generated by the preprocessor and the compiler. I selected the get_user call in drivers/char/sysrq.c for a detailed examination.
The original code in sysrq.c line 587:
The preprocessor output (edited to become somewhat readable):
WOW! Black GCC/assembly magic. This is impossible to follow, so let’s see what code gcc generates:
The optimizer does a good job and gives us something we can actually understand. Can we? The actual user access is quite obvious. Thanks to the unified address space we can just access the address in user memory. But what does the .section stuff do.
To understand this we have to look at the final kernel:
There are obviously 2 non standard ELF sections in the generated object file. But first we want to find out what happened to our code in the final kernel executable:
The whole user memory access is reduced to 10 x86 machine instructions. The instructions bracketed in the .section directives are no longer in the normal execution path. They are located in a different section of the executable file:
Источник
Understanding the Linux Kernel, 3rd Edition by Daniel P. Bovet, Marco Cesati
Get Understanding the Linux Kernel, 3rd Edition now with O’Reilly online learning.
O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.
Exception Handling
Most exceptions issued by the CPU are interpreted by Linux as error conditions. When one of them occurs, the kernel sends a signal to the process that caused the exception to notify it of an anomalous condition. If, for instance, a process performs a division by zero, the CPU raises a “Divide error » exception, and the corresponding exception handler sends a SIGFPE signal to the current process, which then takes the necessary steps to recover or (if no signal handler is set for that signal) abort.
There are a couple of cases, however, where Linux exploits CPU exceptions to manage hardware resources more efficiently. A first case is already described in the section «Saving and Loading the FPU, MMX, and XMM Registers» in Chapter 3. The “Device not available » exception is used together with the TS flag of the cr0 register to force the kernel to load the floating point registers of the CPU with new values. A second case involves the “Page Fault » exception, which is used to defer allocating new page frames to the process until the last possible moment. The corresponding handler is complex because the exception may, or may not, denote an error condition (see the section «Page Fault Exception Handler» in Chapter 9).
Exception handlers have a standard structure consisting of three steps:
Save the contents of most registers in the Kernel Mode stack (this part is coded in assembly language).
Handle the exception by means of a high-level C function.
Exit from the handler by means of the ret_from_exception( ) function.
To take advantage of exceptions, the IDT must be properly initialized with an exception handler function for each recognized exception. It is the job of the trap_init( ) function to insert the final values—the functions that handle the exceptions—into all IDT entries that refer to nonmaskable interrupts and exceptions. This is accomplished through the set_trap_gate( ) , set_intr_gate( ) , set_system_gate( ) , set_system_intr_gate( ) , and set_task_gate( ) functions:
The “Double fault” exception is handled by means of a task gate instead of a trap or system gate, because it denotes a serious kernel misbehavior. Thus, the exception handler that tries to print out the register values does not trust the current value of the esp register. When such an exception occurs, the CPU fetches the Task Gate Descriptor stored in the entry at index 8 of the IDT. This descriptor points to the special TSS segment descriptor stored in the 32 nd entry of the GDT. Next, the CPU loads the eip and esp registers with the values stored in the corresponding TSS segment. As a result, the processor executes the doublefault_fn() exception handler on its own private stack.
Now we will look at what a typical exception handler does once it is invoked. Our description of exception handling will be a bit sketchy for lack of space. In particular we won’t be able to cover:
The signal codes (see Table 11-8 in Chapter 11) sent by some handlers to the User Mode processes.
Exceptions that occur when the kernel is operating in MS-DOS emulation mode (vm86 mode), which must be dealt with differently.
Saving the Registers for the Exception Handler
Let’s use handler_name to denote the name of a generic exception handler. (The actual names of all the exception handlers appear on the list of macros in the previous section.) Each exception handler starts with the following assembly language instructions:
If the control unit is not supposed to automatically insert a hardware error code on the stack when the exception occurs, the corresponding assembly language fragment includes a pushl $0 instruction to pad the stack with a null value. Then the address of the high-level C function is pushed on the stack; its name consists of the exception handler name prefixed by do_ .
The assembly language fragment labeled as error_code is the same for all exception handlers except the one for the “Device not available » exception (see the section «Saving and Loading the FPU, MMX, and XMM Registers» in Chapter 3). The code performs the following steps:
Saves the registers that might be used by the high-level C function on the stack.
Issues a cld instruction to clear the direction flag DF of eflags , thus making sure that autoincreases on the edi and esi registers will be used with string instructions . [*]
Copies the hardware error code saved in the stack at location esp+36 in edx . Stores the value -1 in the same stack location. As we’ll see in the section «Reexecution of System Calls» in Chapter 11, this value is used to separate 0x80 exceptions from other exceptions.
Loads edi with the address of the high-level do_handler_name( ) C function saved in the stack at location esp+32 ; writes the contents of es in that stack location.
Loads in the eax register the current top location of the Kernel Mode stack. This address identifies the memory cell containing the last register value saved in step 1.
Loads the user data Segment Selector into the ds and es registers.
Invokes the high-level C function whose address is now stored in edi .
The invoked function receives its arguments from the eax and edx registers rather than from the stack. We have already run into a function that gets its arguments from the CPU registers: the _ _switch_to( ) function, discussed in the section «Performing the Process Switch» in Chapter 3.
Entering and Leaving the Exception Handler
As already explained, the names of the C functions that implement exception handlers always consist of the prefix do_ followed by the handler name. Most of these functions invoke the do_trap() function to store the hardware error code and the exception vector in the process descriptor of current , and then send a suitable signal to that process:
The current process takes care of the signal right after the termination of the exception handler. The signal will be handled either in User Mode by the process’s own signal handler (if it exists) or in Kernel Mode. In the latter case, the kernel usually kills the process (see Chapter 11). The signals sent by the exception handlers are listed in Table 4-1.
The exception handler always checks whether the exception occurred in User Mode or in Kernel Mode and, in the latter case, whether it was due to an invalid argument passed to a system call. We’ll describe in the section «Dynamic Address Checking: The Fix-up Code» in Chapter 10 how the kernel defends itself against invalid arguments passed to system calls. Any other exception raised in Kernel Mode is due to a kernel bug. In this case, the exception handler knows the kernel is misbehaving. In order to avoid data corruption on the hard disks, the handler invokes the die( ) function, which prints the contents of all CPU registers on the console (this dump is called kernel oops ) and terminates the current process by calling do_exit( ) (see «Process Termination» in Chapter 3).
When the C function that implements the exception handling terminates, the code performs a jmp instruction to the ret_from_exception( ) function. This function is described in the later section «Returning from Interrupts and Exceptions.”
[*] A single assembly language “string instruction,” such as rep;movsb , is able to act on a whole block of data (string).
Get Understanding the Linux Kernel, 3rd Edition now with O’Reilly online learning.
O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.
Источник
Implement managed exception handling on Linux and Mac #3946
Comments
sergiy-k commented Feb 11, 2015
We need to implement throwing and catching exceptions in managed code for Linux and Mac platforms. This work depends on implementation of stack unwinding (https://github.com/dotnet/coreclr/issues/177) on these platforms.
@janvorli Could you please provide design proposal and name of a feature branch when you are ready to start working on this issue? Thank you.
The text was updated successfully, but these errors were encountered:
kangaroo commented Feb 21, 2015
@janvorli Whats remaining? Any chance of the design proposal @sergiy-k asked for?
janvorli commented Feb 21, 2015
@kangaroo @AndyAyersMS @pgavlin @jkotas
Here is a brief overview of the plan. I am leaving for a week of vacation today, so I’ll be able to respond to comments after I return. @sergiy-k said he might do some work on this while I’m away.
Managed exception handling on Linux
Exceptions thrown by the managed code
Managed code throws exceptions by calling IL_Throw FCALL, which uses RaiseException to perform the actual throwing. The FCALL body is wrapped between the HELPER_METHOD_FRAME_BEGIN_ATTRIB_NOPOLL / HELPER_METHOD_FRAME_END macros that internally use INSTALL_UNWIND_AND_CONTINUE_HANDLER / UNINSTALL_UNWIND_AND_CONTINUE_HANDLER .
On Windows, the exception raised by the RaiseException is handled by the native windows unwinding, since the jitter provides Windows with unwinding info necessary for unwinding the jitted code.
On Linux, we need to catch the PAL_SEHException in the INSTALL_UNWIND_AND_CONTINUE_HANDLER (resp. its underlying INSTALL_UNWIND_AND_CONTINUE_HANDLER_NO_PROBE ) and then unwind it manually by walking the managed frames using the Windows style unwinder, using EECodeInfo to get the function entry and module base data and calling ProcessCLRException on each frame until the handler is found and then executed.
There can also be a case when we walk out of the managed frames without finding a handler and reach a bunch of native frames. In this case, we need to rethrow the exception, let the native unwinding do its job. If that piece of native code was called from a managed method, we catch the exception on the border and again use the manual unwinding described above until the handler is found or native frames are reached again.
Exceptions thrown by the runtime methods called from the managed code
This is the same case as the previous one. The code in these methods also uses INSTALL_UNWIND_AND_CONTINUE_HANDLER / UNINSTALL_UNWIND_AND_CONTINUE_HANDLER , so the exception is caught and processed the same way.
Exceptions thrown by PInvoked functions
We will not try to handle these (except for cases where the PInvoke calls QCALLS which also use the INSTALL_UNWIND_AND_CONTINUE_HANDLER / UNINSTALL_UNWIND_AND_CONTINUE_HANDLER and so the exceptions are processed as described above).
Getting an exception from the general PInvoke means that a code that is out of our control has crashed and its state is unknown and cannot be relied on anymore. Trying to handle such state could potentially lead to deeper issues, so it is better to fail fast.
Exceptions thrown by managed code invoked from runtime
This is basically the same case as described above when managed stack walking walks out of the managed code without finding a handler. So the managed frames are unwound and then exception is rethrown and native exception unwinding handles it further.
Hardware exceptions
On Unix, hardware exceptions like invalid instruction, division by zero, null pointer reference and similar are reported by signals. We will write handlers for these signals that will call back into the runtime. The runtime will check whether the exception has happened in the managed code or native code. If it happened in a managed code, then the managed exception unwinding as described above will be performed. The current plan is to not to try to handle such exceptions in the native code and rather abort.
Источник