What is deadlock in linux

Deadlock refers to a blocking phenomenon caused by two or more processes due to competing resources or due to communication with each other. If there is no external force, they will not be able to advance. At this time, the system is said to be in a deadlock state or the system has a deadlock. These processes that are always waiting for each other are called deadlock processes.

Four conditions for deadlock

Mutually exclusive condition : Refers to the exclusive use of the allocated resources by the process, that is, a resource is occupied by only one process within a certain period of time. If there are other processes requesting resources at this time, the requester can only wait until the process occupying the resources is released.

Request and hold conditions : It means that the process has kept at least one resource, but has made a new resource request, and the resource has been occupied by other processes. At this time, the requesting process is blocked, but it still keeps the other resources it has obtained.

Inalienable conditions : Refers to the resources that the process has acquired. It cannot be deprived before it is used up. It can only be released by itself when it is used up.

Loop waiting condition : Refers to a process-resource ring chain when a deadlock occurs, that is, P0 in the process set is waiting for a resource occupied by P1; P1 is waiting for the resource occupied by P2, …, Pn is waiting for resources already occupied by P0

Only when these four conditions are met will a deadlock occur

How to avoid deadlock

The system performs a dynamic check on each resource request that the system can satisfy, and decides whether to allocate resources based on the check result; if a deadlock may occur in the system after allocation, it will not be allocated, otherwise it will be allocated. This is a dynamic strategy to ensure that the system does not enter a deadlock state.

Prevent deadlock

A simple understanding is to destroy any one of the conditions that causes a deadlock, because deadlock generation requires four conditions to be met at the same time. When the deadlock is generated, the deadlock must be unlocked by external force conditions, such as killing the deadlock directly Rough methods such as processes in

Here are two algorithms to prevent deadlock, these two algorithms are also closely related to some situations in life, especially the second

Ordered resource allocation

Such algorithm resources are uniformly numbered according to all resources in a certain rule system (for example, printer is 1, tape drive is 2, disk is 3, etc.), and the application must be in ascending order. The system requires the application process:
- 1. All resources that must be used and belong to the same category must be applied for at one time;
- 2. When applying for different types of resources, you must apply in order according to the number of various types of equipment. For example: process PA, the order of using resources is R1, R2; process PB, the order of using resources is R2, R1; if dynamic allocation is used, it may form a loop condition and cause a deadlock.
Use the orderly resource allocation method: R1 is numbered 1, R2 is numbered 2;
PA: The application order should be: R1, R2
PB: The application sequence should be: R1, R2

This destroys the loop condition and avoids deadlock

The most representative algorithm in the deadlock avoidance algorithm is the banker algorithm proposed by Dijkstra E.W in 1968:
The banker’s algorithm is an important method to avoid deadlock. The institution that prevents deadlock can only ensure that one of the above four conditions does not occur, and the system will not be deadlocked. This algorithm can be used to solve practical problems in life, such as bank loans.
Program implementation ideas The banker algorithm, as the name implies, comes from the bank’s lending business. A certain amount of principal should be loaned to multiple customers. In order to prevent bankers’ funds from failing to close down, each loan must be checked for its ability. Return by deadline. There are similar problems when studying resource allocation strategies in the operating system. The limited resources in the system must be used by multiple processes. It must be ensured that the process of the obtained resources can return the resources within a limited time for other processes to use the resources. If the resource allocation is not obtained, a process looping waiting for resources will occur, and then the process cannot continue to execute the deadlock phenomenon.
Record the need of a process and the resources it has occupied in the process control. Assume that the «state» of the process control block PCB includes the ready state, the waiting state and the completion state. When the process is in the waiting state, it means that the system cannot satisfy the current resource request of the process. «Total resource demand» means the total amount of resources that the process has to apply for during the entire execution process. Obviously, the total resource requirement of each process cannot exceed the total number of resources owned by the system, and the bank algorithm can allocate resources to avoid deadlock.

Deadlock detection and release

First detection: This method does not need to take any restrictive measures in advance, nor does it need to check whether the system has entered an unsafe area, this method allows the system to deadlock during operation. However, the detection mechanism provided by the system can timely detect the occurrence of the deadlock and accurately determine the processes and resources related to the deadlock. Detection methods include timing detection, low-efficiency detection, and process-waiting detection.
Then remove the deadlock: take appropriate measures to remove the deadlock that has occurred from the system.

This is a measure matched with the detection of deadlock. When it is detected that a deadlock has occurred in the system, the process must be released from the deadlock state. A common implementation method is to undo or suspend some processes in order to reclaim some resources, and then allocate these resources to the process that is already in the blocking state, making it ready to continue running. Deadlock detection and release measures may make the system obtain better resource utilization and throughput, but it is also the most difficult to achieve.

Источник

1. Introduction to Lockdep

A deadlock is a phenomenon in which two or more processes are waiting for each other due to competition for resources.

There are two common types of deadlocks:

Recursive deadlocks: Locks are used in deferred operations such as interrupts, and external locks constitute recursive deadlocks.

AB-BA deadlock: Multiple locks cause deadlocks due to improper handling, and inconsistent processing orders on multiple kernel paths can also cause deadlocks.

The Linux kernel provides a deadlock debugging module, Lockdep, which tracks the state of each lock and the dependencies between the locks. A series of validation rules are used to ensure that the dependencies between the locks are correct.

2. Kernel deadlock detection Lockdep

2.1 Enabling Lockdep

The locks detected by Lockdep include spinlock, rwlock, mutex, rwsem deadlock, lock error release, sleep behavior in atomic operations, etc.

The configuration path in the kernel is: Kernel hacking->Lock Debugging (spinlocks, mutexes, etc. ).

Here are the lockcep kernel options and their explanations:

Detect deadlocks of rt mutex and automatically report deadlock scene information.

Detect problems such as uninitialized use of spinlock. With the NMI watchdog, you can find the spinlock deadlock.

Detect and report mutex errors

Detect the slowpath test of the wait/wound type mutex.

Check that the lock in use (spinlock/rwlock/mutex/rwsem) is released, or that the lock in use is reinitialized, or that the lock is held when the process exits.

Enables the kernel to report deadlock details before a deadlock occurs. See /proc/lockdep_chains.

The total switch of the entire Lockdep. See /proc/lockdep, /proc/lockdep_stats.

The lock holds information about the competition area, including waiting time, holding time, and so on. See /proc/lock_stat.

More self-testing during the use of Lockdep will add a lot of extra overhead.

Sleeping in the atomic section can cause a lot of unpredictable problems. These atomic sections include spinlock locks, rcu read operations, kernel preemption, interrupt handling, and so on.

/proc/sys/kernel/lock_stat———————— Set to view the /proc/lock_stat statistics, clear the lockdep statistics .

/proc/lockdep_stats—————————— Statistic information for dependency locks

The kernel also provides Tracepoint to help discover the use of locks: /sys/kernel/debug/tracing/events/lock.

2.3 lockdep code simple analysis

3. Lockdep test

3.1 Testing the spin_lock deadlock

Construct the test case code as follows:

After executing insmod data/lock.ko, the console displays as follows.

First of all, you can know the deadlock type from the description of the deadlock.

Then it details the point at which the deadlock is generated. At this point, you can roughly know which lock it is, and where it is called to cause a deadlock.

This is followed by a detailed backtrace of deadlocks, which helps to analyze the stack traceback when deadlocks are generated.

al: lockdep error test init
hack_lockdep:A->B
hack_lockdep:B->A

=============================================
[ INFO: possible recursive locking detected ]————————————————- —————Detected deadlock description: recursive deadlock type
4.0.0+ #87 Tainted: G O
———————————————
Insmod/658 is trying to acquire lock:—————————————— ———————————- Deadlock details description: want to hold the lock point and the lock point
(hack_spinB)<+.+. >, at: [ ] lockdep_test_init+0x30/0x3c [lock] —————————lockdep_test_init calls hack_spinBA to hold hack_spinB lock again

but task is already holding lock:
(hack_spinB)<+.+. >, at: [ ] hack_spinAB+0x38/0x3c [lock] ——————————— hack_spinB is already held in the hack_spinAB function

Other info that might help us debug this:—————————————— —————————— Additional information about the lock
Possible unsafe locking scenario:

May be due to missing lock nesting notation

2 locks held by insmod/658:—————————————— ———————————————- Processes are held together Two locks
#0: (hack_spinA)<+.+. >, at: [ ] hack_spinAB+0x30/0x3c [lock]
#1: (hack_spinB)<+.+. >, at: [ ] hack_spinAB+0x38/0x3c [lock]

stack backtrace:——————————————————————————————————— Stack traceback information: You can see the call path from lockdep_test_init->_raw_spin_lock->lock_acquire.
CPU: 0 PID: 658 Comm: insmod Tainted: G O 4.0.0+ #87
Hardware name: ARM-Versatile Express
[ ] (unwind_backtrace) from [ ] (show_stack+0x20/0x24)
[ ] (show_stack) from [ ] (dump_stack+0x8c/0xb4)
[ ] (dump_stack) from [ ] (__lock_acquire+0x1aa4/0x1f64)
[ ] (__lock_acquire) from [ ] (lock_acquire+0xf4/0x190)
[ ] (lock_acquire) from [ ] (_raw_spin_lock+0x60/0x98)
[ ] (_raw_spin_lock) from [ ] (lockdep_test_init+0x30/0x3c [lock])
[ ] (lockdep_test_init [lock]) from [ ] (do_one_initcall+0x9c/0x1e8)
[ ] (do_one_initcall) from [ ] (do_init_module+0x70/0x1c0)
[ ] (do_init_module) from [ ] (load_module+0x18b0/0x1f90)
[ ] (load_module) from [ ] (SyS_init_module+0x140/0x150)
[ ] (SyS_init_module) from [ ] (ret_fast_syscall+0x0/0x4c)
INFO: rcu_sched self-detected stall on CPU
0: (2099 ticks this GP) idle=5ed/140000000000001/0 softirq=13024/13024 fqs=1783
(t=2100 jiffies g=-51 c=-52 q=22)
Task dump for CPU 0:
insmod R running 0 658 657 0x00000002
[ ] (unwind_backtrace) from [ ] (show_stack+0x20/0x24)
[ ] (show_stack) from [ ] (sched_show_task+0x128/0x184)
[ ] (sched_show_task) from [ ] (dump_cpu_task+0x48/0x4c)
[ ] (dump_cpu_task) from [ ] (rcu_dump_cpu_stacks+0x9c/0xd4)
[ ] (rcu_dump_cpu_stacks) from [ ] (rcu_check_callbacks+0x640/0x968)
[ ] (rcu_check_callbacks) from [ ] (update_process_times+0x4c/0x74)
[ ] (update_process_times) from [ ] (tick_periodic+0x54/0xf8)
[ ] (tick_periodic) from [ ] (tick_handle_periodic+0x38/0x98)
[ ] (tick_handle_periodic) from [ ] (twd_handler+0x40/0x50)
[ ] (twd_handler) from [ ] (handle_percpu_devid_irq+0xd8/0x1dc)
[ ] (handle_percpu_devid_irq) from [ ] (generic_handle_irq+0x3c/0x4c)
[ ] (generic_handle_irq) from [ ] (__handle_domain_irq+0x6c/0xc4)
[ ] (__handle_domain_irq) from [ ] (gic_handle_irq+0x34/0x6c)
[ ] (gic_handle_irq) from [ ] (__irq_svc+0x44/0x5c)
Exception stack(0xed5c9d18 to 0xed5c9d60)
9d00: 00000000 00010000
9d20: 0000ffff c02f3898 bf0001b0 c0b1d248 123cc000 00000000 0c99b2c5 00000000
9d40: 00000000 ed5c9d84 ed5c9d60 ed5c9d60 c0070cb4 c0070cb4 60000013 ffffffff
[ ] (__irq_svc) from [ ] (do_raw_spin_lock+0xf0/0x1e0)
[ ] (do_raw_spin_lock) from [ ] (_raw_spin_lock+0x84/0x98)
[ ] (_raw_spin_lock) from [ ] (lockdep_test_init+0x30/0x3c [lock])
[ ] (lockdep_test_init [lock]) from [ ] (do_one_initcall+0x9c/0x1e8)
[ ] (do_one_initcall) from [ ] (do_init_module+0x70/0x1c0)
[ ] (do_init_module) from [ ] (load_module+0x18b0/0x1f90)
[ ] (load_module) from [ ] (SyS_init_module+0x140/0x150)
[ ] (SyS_init_module) from [ ] (ret_fast_syscall+0x0/0x4c)
BUG: spinlock lockup suspected on CPU#0, insmod/658———————————————————— The error type is spinlock, and the following backtrace is basically the same as above.
lock: hack_spinB+0x0/0xfffffedc [lock], .magic: dead4ead, .owner: insmod/658, .owner_cpu: 0———— The deadlock is hack_spinB
CPU: 0 PID: 658 Comm: insmod Tainted: G O 4.0.0+ #87
Hardware name: ARM-Versatile Express
[ ] (unwind_backtrace) from [ ] (show_stack+0x20/0x24)
[ ] (show_stack) from [ ] (dump_stack+0x8c/0xb4)
[ ] (dump_stack) from [ ] (spin_dump+0x8c/0xd0)
[ ] (spin_dump) from [ ] (do_raw_spin_lock+0x10c/0x1e0)
[ ] (do_raw_spin_lock) from [ ] (_raw_spin_lock+0x84/0x98)
[ ] (_raw_spin_lock) from [ ] (lockdep_test_init+0x30/0x3c [lock])
[ ] (lockdep_test_init [lock]) from [ ] (do_one_initcall+0x9c/0x1e8)
[ ] (do_one_initcall) from [ ] (do_init_module+0x70/0x1c0)
[ ] (do_init_module) from [ ] (load_module+0x18b0/0x1f90)
[ ] (load_module) from [ ] (SyS_init_module+0x140/0x150)
[ ] (SyS_init_module) from [ ] (ret_fast_syscall+0x0/0x4c)

3.2 mutex test

Execute insmod /data/mutexlock.ko and the results are as follows.

The first is the introduction of deadlock types.

Then there is the caller of the two points that generated the deadlock, and then the stack backtracking of the two points is given in detail.

Finally, the detailed stack traceback of the deadlock point.

The backtrace above, in contrast to the code flow below, will only print the relevant information when CONFIG_PROVE_LOCKING is turned on.

Источник

What is deadlock in linux