Linux kernel wait queue

Linux kernel wait queue

One problem that might arise with read operation is what to do when there’s no data yet, but we’re not at end-of-file. The default answer is ‘‘go to sleep waiting for data.’’ This section shows how a process is put to sleep, how it is awakened, and how an application can ask if there is data without just blindly issuing a read call and blocking. We then apply the same concepts to write.

Whenever a process must wait for an event (such as the arrival of data or the termination of a process), it should go to sleep. Sleeping causes the process to suspend execution, freeing the processor for other uses. At some future time, when the event being waited for occurs, the process will be woken up and will continue with its job.

There are several ways of handling sleeping and waking up in Linux, each suited to different needs. All, however, work with the same basic data type, a wait queue (wait_queue_head_t). A wait queue is exactly that—a queue of processes that are waiting for an event. Wait queues are declared and initialized as follows:

wait_queue_head_t my_queue;
init_waitqueue_head (&my_queue);

When a wait queue is declared statically (i.e., not as an automatic variable of a procedure or as part of a dynamically-allocated data structure), it is also possible to initialize the queue at compile time:

DECLARE_WAIT_QUEUE_HEAD (my_queue);

It is a common mistake to neglect to initialize a wait queue (especially since earlier versions of the kernel did not requir e this initialization); if you forget, the results will usually not be what you intended. Once the wait queue is declared and initialized, a process may use it to go to sleep. Sleeping is accomplished by calling one of the variants of sleep_on, depending on how deep a sleep is called for.

sleep_on(wait_queue_head_t *queue);

Puts the process to sleep on this queue. sleep_on has the disadvantage of not being interruptible; as a result, the process can end up being stuck (and unkillable) if the event it’s waiting for never happens.

interruptible_sleep_on(wait_queue_head_t *queue);

The interruptible variant works just like sleep_on, except that the sleep can be interrupted by a signal. This is the form that device driver writers have been using for a long time, before wait_event_interruptible appeared.

sleep_on_timeout(wait_queue_head_t *queue, long timeout);
interruptible_sleep_on_timeout(wait_queue_head_t *queue, long timeout);

These two functions behave like the previous two, with the exception that the sleep will last no longer than the given timeout period. The timeout is specified in ‘‘jiffies’’.

void wait_event(wait_queue_head_t queue, int condition);
int wait_event_interruptible(wait_queue_head_t queue, int condition);

These macros are the preferred way to sleep on an event. They combine waiting for an event and testing for its arrival in a way that avoids race conditions. They will sleep until the condition, which may be any boolean C expression, evaluates true. The macros expand to a while loop, and the condition is reevaluated over time—the behavior is different from that of a function call or a simple macro, where the arguments are evaluated only at call time. The latter macro is implemented as an expression that evaluates to 0 in case of success and -ERESTARTSYS if the loop is interrupted by a signal. It is worth repeating that driver writers should almost always use the interruptible instances of these functions/macros. The noninterruptible version exists for the small number of situations in which signals cannot be dealt with, for example, when waiting for a data page to be retrieved from swap space. Most drivers do not present such special situations. Of course, sleeping is only half of the problem; something, somewhere will have to wake the process up again. When a device driver sleeps directly, there is usually code in another part of the driver that performs the wakeup, once it knows that the event has occurred. Typically a driver will wake up sleepers in its interrupt handler once new data has arrived. Other scenarios are possible, however. Just as there is more than one way to sleep, so there is also more than one way to wake up. The high-level functions provided by the kernel to wake up processes are as follows:.

Читайте также:  Windows вин про 10

wake_up(wait_queue_head_t *queue);
This function will wake up all processes that are waiting on this event queue.

wake_up_interruptible(wait_queue_head_t *queue);
wake_up_interruptible wakes up only the processes that are in interruptible
sleeps. Any process that sleeps on the wait queue using a noninterruptible
function or macro will continue to sleep.

wake_up_sync(wait_queue_head_t *queue);

Normally, a wake_up call can cause an immediate reschedule to happen, meaning that other processes might run before wake_up retur ns. The “synchronous” variants instead make any awakened processes runnable, but do not reschedule the CPU. This is used to avoid rescheduling when the current process is known to be going to sleep, thus forcing a reschedule anyway. Note that awakened processes could run immediately on a different processor, so these functions should not be expected to provide mutual exclusion. If your driver is using interruptible_sleep_on, there is little difference between wake_up and wake_up_interruptible. Calling the latter is a common convention, however, to preserve consistency between the two calls.

As an example of wait queue usage, imagine you want to put a process to sleep when it reads your device and awaken it when someone else writes to the device.

The following code does just that:

ssize_t sleepy_read (struct file *filp, char *buf, size_t count,
loff_t *pos)
<
printk(KERN_DEBUG «process %i (%s) going to sleep\n»,
current->pid, current->comm);
interruptible_sleep_on(&wq);
printk(KERN_DEBUG «awoken %i (%s)\n», current->pid, current->comm);
return 0; /* EOF */
>

ssize_t sleepy_write (struct file *filp, const char *buf, size_t count,
loff_t *pos)
<
printk(KERN_DEBUG «process %i (%s) awakening the readers. \n»,
current->pid, current->comm);
wake_up_interruptible(&wq);
return count; /* succeed, to avoid retrial */
>

An important thing to remember with wait queues is that being woken up does not guarantee that the event you were waiting for has occurred; a process can be woken for other reasons, mainly because it received a signal. Any code that sleeps should do so in a loop that tests the condition after returning from the sleep.

Источник

Simple wait queues

Please consider subscribing to LWN

Subscriptions are the lifeblood of LWN.net. If you appreciate this content and would like to see more of it, your subscription will help to ensure that LWN continues to thrive. Please visit this page to join up and keep LWN on the net.

A look at from the 2.0 kernel reveals a simple data structure: a basic linked list of waiting threads. A wake_up() call on a wait queue would walk the list, putting each thread into the runnable state; there was not a whole more to it than that. Then, in 1999, the infamous Mindcraft study pointed out some performance deficiencies in Linux; one of those was the "thundering herd" problem where multiple processes would be awakened and contend for a resource that only one of them could obtain. As a result, the "exclusive wait" functionality — where only the first of possibly many waiting threads would wake — was added. Then a callback mechanism was added in the 2.5 series so that the new asynchronous I/O facility could step in when things would otherwise block. And so on.

The end result is a data structure that is far larger and more complex than it was in the 2.0 days. It is the callback feature that was most problematic for the realtime tree, though; since those callbacks can sleep, they prevent the use of "raw" spinlocks to protect the wait queues themselves. To work around this problem, Thomas Gleixner created a new "simple wait queue" mechanism that would dispense with most of the added functionality and, thus, be suitable for use in the realtime kernel.

The 2013 Realtime Linux Workshop identified this code as a candidate for a relatively easy move into the mainline. In response, Paul Gortmaker has extracted the simple wait queue facility and posted the resulting patch series for review.

The code looks a lot like a return to the 2.0 kernel; much of the functionality that wait queues have gained in the meantime has been stripped away, leaving a familiar-looking linked list of waiting threads. There is no exclusive wakeup feature, no callback feature, and not much of anything else. What there is, though, is a wait queue mechanism that is sufficient for the needs of most wait queue users (of which there are many) in the kernel.

The API is similar to that of existing wait queues. Wait queue entries and wait queue heads are defined with:

The low-level API, which requires a direct call to schedule() to put the calling thread to sleep, looks like this:

The swait_prepare() call is used to add the process to the given wait queue head and put it into the appropriate sleeping state. After performing any necessary checks and calling schedule(), the newly woken thread will call swait_finish() to remove itself from the queue and clean up.

The current wait queue implementation has an extensive set of macros to simplify the task of waiting for a condition; there is a similar, but much smaller set for simple wait queues:

Most of the other versions of wait_event(), including the "killable" variants, are not provided. It is amusing to look at a list of wait_event() macros that lack equivalents in the new API, just to see how this interface has grown over the years:

There is little impediment to adding "simple" versions of most of the above macros should the need arise; it will be interesting to see how many of them show up in the coming years. Needless to say, there is also nothing like the archaic sleep_on() interface; it is safe to say nobody will try to add a version of that.

Paul's posting notes that adding the simple wait queues makes the kernel smaller, even when they are only used in a couple of places. Given the size reduction and the relative simplicity of the interface, it is unsurprising that there has been no opposition to adding this code so far. The only real question is how that addition is to be done. Christoph Hellwig suggested that the simple wait queues could simply replace the current implementation, with the few places needing the fancier functionality being changed to use the older code under a new name. Paul, though, worried that such a wholesale change would create a flag day with problems being associated with the wait queue change in mysterious ways.

Nobody wants that kind of situation, so it seems more likely that simple wait queues will retain their "swait" naming scheme. The kernel might see a wholesale naming change for the existing wait queues to make it clear that there is now a choice to be made, though. Thus, we may see a large patch changing wait_event() to cwait_event(), and so on, without changing functionality; after that, individual call sites could be changed to simple wait queues at leisure. The result would be a fair amount of code churn, but that churn should leave a smaller and simpler kernel in its wake.

Index entries for this article
Kernel Wait queues

(Log in to post comments)

Simple wait queues

Posted Dec 19, 2013 8:39 UTC (Thu) by johill (subscriber, #25196) [Link]

The current waitqueues make that impossible, but unfortunately deadlocks are possible. The swait code doesn't contain any such annotations right now, but I think it would be very useful.

With swait_prepare()/_finish(), it seems adding lockdep annotations might actually be possible?

Simple wait queues

Posted Dec 19, 2013 9:19 UTC (Thu) by smurf (subscriber, #17840) [Link]

>> sleep_on() will undoubtedly exist when the 2.7.0 kernel is released, but there may be very few callers of it by then

… and while there never was a 2.7, sleep_on() lingers on, in a few old device drivers …

Copyright © 2013, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds

Источник

Linux kernel - wait queues

I'm reading "Linux kernel development 3rd edition by Robert Love" to get a general idea about how the Linux kernel works..(2.6.2.3)

I'm confused about how wait queues work for example this code:

I want to know which process is running this code? is it a kernel thread? whose process time is this?

And also in the loop, while the condition is still not met we will continue sleeping and call schedule to run another process the question is when do we return to this loop?

The book says that when a process sleeps, it's removed from our run queue, else it would be waken and have to enter a busy loop.

  • Also says: "sleeping should always be handled in a loop that ensures that the condition for which the task is waiting has indeed occurred."
  • I just want to know in what context is this loop running?

    Sorry if this is a stupid Question. I'm just having trouble seeing the big pic

    2 Answers 2

    Which process is running the code? The process that called it. I don't mean to make fun of the question but the gist is that kernel code can run in different contexts: Either because a system call led to this place, because it is in a interrupt handler, or because it is a callback function called from another context (such as workqueues or timer functions).

    Since this example is sleeping, it must be in a context where sleeping is allowed, meaning it is executed in response to a system call or at least in a kernel thread. So the answer is the process time is taken from the process (or kernel thread) that called into this kernel code that needs to sleep. That is the only place where sleeping is allowed in the first place.

    A certain special case are workqueues, these are explicitly for functions that need to sleep. Typical use would be to queue a function that needs to sleep from a context where sleeping is forbidden. In that case, the process context is that of one of the kernel worker threads designated to process workqueue items.

    You will return to this loop when the wait_queue is woken up, which either sets one task waiting on the queue to runnable or all of them, depending on the wake_up function called.

    The most important thing is, forget about this unless you are interested in the implementation details. Since many people got this wrong and it's basically the same thing everywhere it's needed, there have long been macros encapsulating the whole procedure. Look up wait_event(), that's how your example should really look like:

    Источник

    Читайте также:  Kerio connect для linux
    Оцените статью