Spin lock in linux

Locking lessonsВ¶

Lesson 1: Spin locksВ¶

The most basic primitive for locking is spinlock:

The above is always safe. It will disable interrupts _locally_, but the spinlock itself will guarantee the global lock, so it will guarantee that there is only one thread-of-control within the region(s) protected by that lock. This works well even under UP also, so the code does _not_ need to worry about UP vs SMP issues: the spinlocks work correctly under both.

NOTE! Implications of spin_locks for memory are further described in:

The above is usually pretty simple (you usually need and want only one spinlock for most things — using more than one spinlock can make things a lot more complex and even slower and is usually worth it only for sequences that you know need to be split up: avoid it at all cost if you aren’t sure).

This is really the only really hard part about spinlocks: once you start using spinlocks they tend to expand to areas you might not have noticed before, because you have to make sure the spinlocks correctly protect the shared data structures everywhere they are used. The spinlocks are most easily added to places that are completely independent of other code (for example, internal driver data structures that nobody else ever touches).

NOTE! The spin-lock is safe only when you also use the lock itself to do locking across CPU’s, which implies that EVERYTHING that touches a shared variable has to agree about the spinlock they want to use.

Lesson 2: reader-writer spinlocks.В¶

If your data accesses have a very natural pattern where you usually tend to mostly read from the shared variables, the reader-writer locks (rw_lock) versions of the spinlocks are sometimes useful. They allow multiple readers to be in the same critical region at once, but if somebody wants to change the variables it has to get an exclusive write lock.

NOTE! reader-writer locks require more atomic memory operations than simple spinlocks. Unless the reader critical section is long, you are better off just using spinlocks.

The routines look the same as above:

The above kind of lock may be useful for complex data structures like linked lists, especially searching for entries without changing the list itself. The read lock allows many concurrent readers. Anything that changes the list will have to get the write lock.

NOTE! RCU is better for list traversal, but requires careful attention to design detail (see Using RCU to Protect Read-Mostly Linked Lists ).

Also, you cannot “upgrade” a read-lock to a write-lock, so if you at _any_ time need to do any changes (even if you don’t do it every time), you have to get the write-lock at the very beginning.

NOTE! We are working hard to remove reader-writer spinlocks in most cases, so please don’t add a new one without consensus. (Instead, see RCU Concepts for complete information.)

Lesson 3: spinlocks revisited.В¶

The single spin-lock primitives above are by no means the only ones. They are the most safe ones, and the ones that work under all circumstances, but partly because they are safe they are also fairly slow. They are slower than they’d need to be, because they do have to disable interrupts (which is just a single instruction on a x86, but it’s an expensive one — and on other architectures it can be worse).

If you have a case where you have to protect a data structure across several CPU’s and you want to use spinlocks you can potentially use cheaper versions of the spinlocks. IFF you know that the spinlocks are never used in interrupt handlers, you can use the non-irq versions:

(and the equivalent read-write versions too, of course). The spinlock will guarantee the same kind of exclusive access, and it will be much faster. This is useful if you know that the data in question is only ever manipulated from a “process context”, ie no interrupts involved.

The reasons you mustn’t use these versions if you have interrupts that play with the spinlock is that you can get deadlocks:

where an interrupt tries to lock an already locked variable. This is ok if the other interrupt happens on another CPU, but it is _not_ ok if the interrupt happens on the same CPU that already holds the lock, because the lock will obviously never be released (because the interrupt is waiting for the lock, and the lock-holder is interrupted by the interrupt and will not continue until the interrupt has been processed).

Читайте также:  Как поменять язык во всем windows 10

(This is also the reason why the irq-versions of the spinlocks only need to disable the _local_ interrupts — it’s ok to use spinlocks in interrupts on other CPU’s, because an interrupt on another CPU doesn’t interrupt the CPU that holds the lock, so the lock-holder can continue and eventually releases the lock).

Reference information:В¶

For dynamic initialization, use spin_lock_init() or rwlock_init() as appropriate:

For static initialization, use DEFINE_SPINLOCK() / DEFINE_RWLOCK() or __SPIN_LOCK_UNLOCKED() / __RW_LOCK_UNLOCKED() as appropriate.

© Copyright The kernel development community.

Источник

Synchronization primitives in the Linux kernel. Part 2.

Queued Spinlocks

This is the second part of the chapter which describes synchronization primitives in the Linux kernel. In the first part of this chapter we meet the first spinlock. We will continue to learn about this synchronization primitive here. If you have read the previous part, you may remember that besides normal spinlocks, the Linux kernel provides a special type of spinlocks — queued spinlocks . Here we will try to understand what this concept represents.

We saw the API of spinlock in the previous part:

  • spin_lock_init — produces initialization of the given spinlock ;
  • spin_lock — acquires given spinlock ;
  • spin_lock_bh — disables software interrupts and acquire given spinlock ;
  • spin_lock_irqsave and spin_lock_irq — disable interrupts on local processor and preserve/not preserve previous interrupt state in the flags ;
  • spin_unlock — releases given spinlock and acquire given spinlock ;
  • spin_unlock_bh — releases given spinlock and enables software interrupts;
  • spin_is_locked — returns the state of the given spinlock ;
  • and etc.

And we know that all of these macros with the arch_* prefix which are defined in the include/linux/spinlock.h header file will be expanded to the call of the functions from the include/asm-generic/qspinlock.h:

Before we consider how queued spinlocks and their API are implemented, let’s first take a look at the theory.

Introduction to queued spinlocks

Queued spinlocks is a locking mechanism in the Linux kernel which is replacement for the standard spinlocks . At least this is true for the x86_64 architecture. If we will look at the following kernel configuration file — kernel/Kconfig.locks, we will see following configuration entries:

This means that the CONFIG_QUEUED_SPINLOCKS kernel configuration option will be enabled by default if the ARCH_USE_QUEUED_SPINLOCKS is enabled. We may see that the ARCH_USE_QUEUED_SPINLOCKS is enabled by default in the x86_64 specific kernel configuration file — arch/x86/Kconfig:

Before we start to consider what queued spinlock concept is, let’s look on other types of spinlocks . For the start let’s consider how a normal spinlock is implemented. Usually, the implementation of a normal spinlock is based on the test and set instruction. The principle of how this instruction works is pretty simple. It writes a value to the memory location and returns the old value from it. Together these instructions are atomic i.e. non-interruptible instructions. So if the first thread starts to execute this instruction, second thread will wait until the first processor has finished its instruction. A basic lock can be built on top of this mechanism. Schematically it may look like this:

The first thread will execute the test_and_set which will set the lock to 1 . When the second thread calls the lock function, it will spin in the while loop, until the first thread calls the unlock function and the lock will be equal to 0 . This implementation is not very good for performance reasons, due to (at least) two problems. The first problem is that this implementation may be unfair since other threads which arrived later at the lock may acquire it first. The second problem is that all threads which want to acquire a lock must execute many atomic operations like test_and_set on a variable which is in shared memory. This leads to the cache invalidation as the cache of the processor will store lock=1 , but the value of the lock in memory may not be 1 after a thread will release this lock.

The topic of this part is queued spinlocks . This approach may help to solve both of these problems. The queued spinlocks allows each processor to spin while checking its own memory location. The basic principle of a queue-based spinlock can best be understood by studying a classic queue-based spinlock implementation called the MCS lock. Before we look at implementation of the queued spinlocks in the Linux kernel, we will try to understand how MCS lock works.

The basic idea of the MCS lock is that a thread spins on a local variable and each processor in the system has its own copy of this variable (see the previous paragraph). In other words this concept is built on top of the per-cpu variables concept in the Linux kernel.

When the first thread wants to acquire a lock, it registers itself in the queue . In other words it will be added to the special queue and will acquire lock, because it is free for now. When the second thread wants to acquire the same lock before the first thread release it, this thread adds its own copy of the lock variable into this queue . In this case the first thread will contain a next field which will point to the second thread. From this moment, the second thread will wait until the first thread release its lock and notify next thread about this event. The first thread will be deleted from the queue and the second thread will be owner of a lock.

Читайте также:  Change environment var linux

Schematically we can represent it like:

First thread tries to acquire a lock:

Second thread tries to acquire a lock:

Or the pseudocode:

That’s all we’ll say about the theory of the queued spinlocks . Now let’s consider how this mechanism is implemented in the Linux kernel. Unlike above pseudocode, the implementation of the queued spinlocks looks complex and tangled. But the study with attention will lead to success.

API of queued spinlocks

Now that we know a little about queued spinlocks from the theoretical side, it’s time to see the implementation of this mechanism in the Linux kernel. As we saw above, the include/asm-generic/qspinlock.h header file provides a set of macros which represents the API for a spinlock acquiring, releasing, etc:

All of these macros expand to the call of functions from the same header file. Additionally, we saw the qspinlock structure from the include/asm-generic/qspinlock_types.h header file which represents a queued spinlock in the Linux kernel:

The val field represents the state of a given spinlock . This 4 bytes field consists from following parts:

  • 0-7 — locked byte;
  • 8 — pending bit;
  • 9-15 — not used;
  • 16-17 — two bit index which represents entry of the per-cpu array of the MCS lock (will see it soon);
  • 18-31 — contains number of processor which indicates tail of the queue.

Before we move on to consider the API of queued spinlocks , notice the val field of the qspinlock structure has type — atomic_t which represents atomic variable aka a «one operation at a time» variable. So, all operations with this field will be atomic. For example let’s look at the reading value of the val API:

Ok, now we know data structures which represents queued spinlock in the Linux kernel and now is the time to look at the implementation of the main function from the queued spinlocks API:

Yes, this function is — queued_spin_lock . As we may understand from the function’s name, it allows a thread to acquire a lock. This function is defined in the include/asm-generic/qspinlock_types.h header file and its implementation is:

Looks pretty easy, except for the queued_spin_lock_slowpath function. We see that it takes only one parameter. In our case this parameter represents queued spinlock , which will be locked. Let’s consider the situation where queue with locks is empty for now and the first thread wanted to acquire lock. As we may see the queued_spin_lock function starts from the call of the atomic_cmpxchg_acquire macro. As you may guess from its name, it executes atomic CMPXCHG instruction. Ultimately, the atomic_cmpxchg_acquire macro expands to the call of the __raw_cmpxchg macro almost like the following:

which compares the old with the value pointed to by ptr . If they differ, it stores the new in the memory location which is pointed by the ptr and returns the initial value in this memory location.

Let’s back to the queued_spin_lock function. Assuming that we are the first one who tried to acquire the lock, the val will be zero and we will return from the queued_spin_lock function:

So far, we’ve only considered uncontended case (i.e. fast-path). Now let’s consider contended case (i.e. slow-path). Suppose that one thread tried to acquire a lock, but the lock is already held, then queued_spin_lock_slowpath will be called. The queued_spin_lock_slowpath function is defined in the kernel/locking/qspinlock.c source code file:

which waits for in-progress lock acquisition to be done with a bounded number of spins so that we guarantee forward progress. Above, we saw that the lock contains — pending bit. This bit represents thread which wanted to acquire lock, but it is already acquired by the other thread and queue is empty at the same time. In this case, the pending bit will be set and the queue will not be touched. This is done for optimization, because there are no need in unnecessary latency which will be caused by the cache invalidation in a touching of own mcs_spinlock array.

If we observe contention, then we have no choice other than queueing, so jump to queue label that we’ll see later:

Читайте также:  Windows управление дисками не распределена не дает создать простой том

So, the lock is already held. That is, we set the pending bit of the lock:

Again if we observe contention, undo the pending and queue.

Now, we’re pending, wait for the lock owner to release it.

We are allowed to take the lock. So, we clear the pending bit and set the locked bit. Now we have nothing to do with the queued_spin_lock_slowpath function, return from it.

Before diving into queueing, we’ll see about MCS lock mechanism first. As we already know, each processor in the system has own copy of the lock. The lock is represented by the following structure:

from the kernel/locking/mcs_spinlock.h header file. The first field represents a pointer to the next thread in the queue . The second field represents the state of the current thread in the queue , where 1 is lock already acquired and 0 in other way. And the last field of the mcs_spinlock structure represents nested locks. To understand what nested lock is, imagine situation when a thread acquired lock, but was interrupted by the hardware interrupt and an interrupt handler tries to take a lock too. For this case, each processor has not just copy of the mcs_spinlock structure but array of these structures:

This array allows to make four attempts of a lock acquisition for the four events in following contexts:

  • normal task context;
  • hardware interrupt context;
  • software interrupt context;
  • non-maskable interrupt context.

Notice that we did not touch queue yet. We no need in it, because for two threads it just leads to unnecessary latency for memory access. In other case, the first thread may release it lock before this moment. In this case the lock->val will contain _Q_LOCKED_VAL | _Q_PENDING_VAL and we will start to build queue . We start to build queue by the getting the local copy of the qnodes array of the processor which executes thread and calculate tail which will indicate the tail of the queue and idx which represents an index of the qnodes array:

After this, we set locked to zero because this thread didn’t acquire lock yet and next to NULL because we don’t know anything about other queue entries:

We already touched per-cpu copy of the queue for the processor which executes current thread which wants to acquire lock, this means that owner of the lock may released it before this moment. So we may try to acquire lock again by the call of the queued_spin_trylock function:

It does the almost same thing queued_spin_lock function does.

If the lock was successfully acquired we jump to the release label to release a node of the queue :

because we no need in it anymore as lock is acquired. If the queued_spin_trylock was unsuccessful, we update tail of the queue:

and retrieve previous tail. The next step is to check that queue is not empty. In this case we need to link previous entry with the new. While waitaing for the MCS lock, the next pointer may have been set by another lock waiter. We optimistically load the next pointer & prefetch the cacheline for writing to reduce latency in the upcoming MCS unlock operation:

If the new node was added, we prefetch cache line from memory pointed by the next queue entry with the PREFETCHW instruction. We preload this pointer now for optimization purpose. We just became a head of queue and this means that there is upcoming MCS unlock operation and the next entry will be touched.

Yes, from this moment we are in the head of the queue . But before we are able to acquire a lock, we need to wait at least two events: current owner of a lock will release it and the second thread with pending bit will acquire a lock too:

After both threads will release a lock, the head of the queue will hold a lock. In the end we just need to update the tail of the queue and remove current head from it.

Conclusion

This is the end of the second part of the synchronization primitives chapter in the Linux kernel. In the previous part we already met the first synchronization primitive spinlock provided by the Linux kernel which is implemented as ticket spinlock . In this part we saw another implementation of the spinlock mechanism — queued spinlock . In the next part we will continue to dive into synchronization primitives in the Linux kernel.

If you have questions or suggestions, feel free to ping me in twitter 0xAX, drop me email or just create issue.

Please note that English is not my first language and I am really sorry for any inconvenience. If you found any mistakes please send me PR to linux-insides.

Источник

Оцените статью