Linux block device driver code

Linux block device driver code

Please include the original text or link to the author’s website when reprinting:http://oliveryang.net

1. Background

Sampleblk is a Linux block device driver project for learning purposes. among them day1 The source code implements a simplest block device driver, the source code is only more than 200 lines. This article focuses on these source codes and discusses the basic knowledge of Linux block device driver development.

Development of Linux driver requires a series of development environment preparations. The Sampleblk driver was developed and debugged under Linux 4.6.0. Due to the great changes in the API of the common block layer in different Linux kernel versions, this driver may have problems compiling in other kernel versions. To develop, compile and debug the kernel module, you need to prepare the kernel development environment and compile the kernel source code. These basic contents are available everywhere on the Internet, so I won’t repeat them in this article.

In addition, the classic books on developing Linux device drivers areDevice Drivers, Third Edition AbbreviationLDD3. The book is free, you can download it freely and redistribute it in accordance with its specified License.

2. Module initialization and exit

The development of Linux driver modules complies with the basic framework and API provided by Linux for module developers. LDD3hello world Module provides an example of writing a minimal kernel module. The module driven by the Sampleblk block is similar to it, which implements the module initialization and exit functions necessary for the Linux kernel module.

Different from the hello world module, the initialization and exit functions of the Sampleblk driver must implement the basic functions necessary for a block device driver. This section mainly explains this part in detail.

2.1 sampleblk_init

To sum it up, sampleblk_init In order to complete the initialization of the block device driver, the function mainly does the following things,

2.1.1 Block device registration

transfer register_blkdev Complete the distribution and registration of the major number, the function prototype is as follows,

The Linux kernel maintains a global hash table for block device drivers major_names The bucket of this hash table is pointed to by the integer index of [0..255] blk_major_name An array of structure pointers.

While register_blkdev of major When the parameter is not 0, its implementation tries to find the specified in this hash table major Assign a new free pointer in the corresponding bucket blk_major_name , Initialized according to the specified parameters major with name . if specified major Has been occupied by others (the pointer is not empty), it means major The number conflicts and an error is returned.

when major When the parameter is 0, the kernel allocates an unused one from the integer range of [1..255] to the caller. Therefore, although the Linux kernelMajor Number Is 12 bits, do not specify major Is still allocated from the range of [1..255].

The Sampleblk driver is specified major Is 0, let the kernel allocate and register an unused major device number for it, the code is as follows,

2.1.2 Allocation and initialization of drive state data structure

Generally, all Linux kernel drivers declare a data structure to store state information that the driver needs to access frequently. Here, we also declare one for the Sampleblk driver,

In order to simplify implementation and facilitate debugging, the Sampleblk driver only supports one minor device number for the time being, and it can be accessed with the following global variables,

The following code assigns sampleblk_dev Structure and initialize the members of the structure,

sampleblk_dev->size = sampleblk_sect_size * sampleblk_nsects;
sampleblk_dev->data = vmalloc(sampleblk_dev->size);
if (!sampleblk_dev->data) <
rv = -ENOMEM;
goto fail_dev;
>
sampleblk_dev->minor = minor;

2.1.3 Request Queue initialization

Use blk_init_queue To initialize the Request Queue, you need to declare a so-called Strategy callback and protect the Spinlock of the Request Queue. Then the function pointer and spin lock pointer of the strategy callback are passed to the function as parameters.

In the Sampleblk driver, it is sampleblk_request Function and sampleblk_dev->lock ,

Strategy function sampleblk_request Used to perform read and write IO operations on block devices, the main entry parameter is the Request Queue structure: struct request_queue . We will introduce the specific implementation of the strategy function later.

When executed blk_init_queue When, its internal implementation will do the following processing,

  1. Allocate one from memory struct request_queue structure.
  2. Initialization struct request_queue structure. For the caller, the initialization of the following parts is particularly important,
    • blk_init_queue The specified strategy function pointer will be assigned to struct request_queue of request_fn member.
    • blk_init_queue The specified spin lock pointer will be assigned to struct request_queue of queue_lock member.
    • With this request_queue Initialization of the associated IO scheduler.
Читайте также:  Создаем vpn подключение windows 10

The Linux kernel provides a variety of methods to allocate and initialize Request Queue,

  • blk_mq_init_queue Mainly used for block device drivers using multi-queue technology
  • blk_alloc_queue with blk_queue_make_request Mainly used to bypass the merging and sorting of the IO scheduler supported by the kernel and use a custom implementation.
  • blk_init_queue The IO scheduler supported by the kernel is used, and the driver only focuses on the implementation of the strategy function.

The Sampleblk driver belongs to the third case. Here again:If the block device driver needs to use the standard IO scheduler to merge or sort IO requests, you must use blk_init_queue To allocate and initialize the Request Queue.

2.1.4 Block device operation function table initialization

Linux block device operation function table block_device_operations Defined in include/linux/blkdev.h File. The block device driver can customize the standard block device driver operation function by defining this operation function table. If the driver does not implement the method defined in this operation table, the Linux block device layer code will also work according to the default behavior of the block device common layer code.

Although the Sampleblk driver declares its own open , release , ioctl Methods, but none of the driving functions corresponding to these methods do any real work. Therefore, the actual behavior of the block device is realized by the common layer of the block device.

2.1.5 Disk creation and initialization

Linux kernel use struct gendisk To abstract and represent a disk. In other words, to support normal block device operations, a block device driver must allocate and initialize a struct gendisk 。

First, use alloc_disk Assign one struct gendisk ,

Then, initialize struct gendisk Important members of the, especially the block device operation function table, Rquest Queue, and capacity settings. Final call add_disk To make the disk visible in the system and trigger the uevent of disk hot swap.

2.2 sampleblk_exit

this is a sampleblk_init The inverse process,

Stop and release the block device IO request queue

Before Linux 3.8, the kernel blk_run_queue with blk_cleanup_queue When executing at the same timeSerious bug。
I recently discovered this bug in a stress test of Surprise Remove with disk IO (to be honest, I’m a little surprised, this bug has been around for so long and no one has discovered it).

Release data area

Release the driver global data structure.

Unregister the block device.

3. Strategy function implementation

To understand the implementation of the strategy function of the block device driver, you must first understand the key data structure of the Linux IO stack.

3.1 struct request_queue

The block device drives the pending IO request queue structure. If the queue is used blk_init_queue Allocated and initialized, the IO request in the team ( struct request ) Need to be processed by the IO scheduler (sort or merge), by blk_queue_bio trigger.

When the block device strategy driver function is called, request Is through its queuelist Member links in struct request_queue of queue_head In the linked list. There will be many on the IO application queue request structure.

3.2 struct bio

One bio Logically represents a certain task pairGeneral block device layerIO request initiated. IO requests from different applications, different contexts, and different threads are encapsulated into different types at the block device driver layer bio data structure.

the same one bio The structured data is composed of block devicesPhysically consecutive sectors starting from the starting sectorconsist of. Because continuous physical sectors on a block device cannot be guaranteed to be continuous in physical memory in memory, there isSegment (Segment)the concept of. The sector of the block device inside the Segment isPhysical memory contiguousYes, but the continuity of physical memory cannot be guaranteed between Segments. The segment length will not exceed the memory page size, and it is always an integer multiple of the sector size.

The following figure clearly shows the layout of sectors, blocks and segments in the memory page (Page), and the relationship between them (Note: The figure is taken from Understand Linux Kernel 3rd edition, The copyright belongs to the original author),

Therefore, a Segment can be uniquely determined with [page, offset, len]. One bio The structure can contain multiple segments. While bio The structure expresses this one-to-many relationship through an array of pointers to the Segment.

Читайте также:  Как сделать гостевого пользователя windows

In struct bio Member bi_io_vec Is the base address of the «array of pointers to Segment» mentioned above, and the elements of each array point to struct bio_vec Pointer.

While struct bio_vec Is to describe the data structure of a Segment,

In struct bio Another member in bi_vcnt Used to describe this bio How many Segments are there, that is, the number of elements in the pointer array. One bio The maximum number of Segments/Pages included is determined by the following kernel macro definitions,

Multiple bio Structure can be passed by members bi_next Link into a linked list. bio The linked list can be a task that does IO task_struct Member bio_list A linked list maintained. Can also be some struct request It belongs to a linked list (content in the next section).

The picture below shows bio Structure passed bi_next A linked list of links. Each of them bio There is a one-to-many relationship between structure and Segment/Page (Note: the figure is taken from Professional Linux Kernel Architecture, and the copyright belongs to the original author),

3.3 struct request

One request Logically representsBlock device driver layerIO request received. The data requested by the IO is on the block devicePhysically consecutive sectors starting from the starting sectorconsist of.

In struct request Can contain many struct bio , Mainly through bio Structural bi_next Link into a linked list. The first of this linked list bio Structure by struct request of bio Member points. and the end of the linked list is biotail Member points.

The general block device layer received from different threads bio Later, one of the following two options is usually selected according to the situation,

Will bio Merge into existing request

Because of each bio The structure comes from different tasks, so IO request merging can only be request The structural level is completed by the insertion and sorting of the linked list. bio The internal structure will not be modified.

Assign new request

Wait for the upper task to pass blk_finish_plug To trigger blk_run_queue Action, block device-driven strategy function request_fn Will trigger the sorting operation of the IO scheduler, request Sort the IO request queue inserted into the block device driver.

In either case, the general block device code will call the block driver to register in request_queue of request_fn Callback, this callback will eventually merge or sort the request The underlying functions of the driver are handed over to do IO operations.

3.4 Strategy function request_fn

As mentioned earlier, when the block device driver is used blk_run_queue To allocate and initialize request_queue When, this function also needs to drive the specified custom strategy function request_fn And the required spin lock queue_lock . The driver implements its own request_fn When you need to understand the following characteristics,

When the generic block layer code is called request_fn When the kernel has taken this request_queue of queue_lock . Therefore, the context at this time is the atomic context. Exit in the driven strategy function queue_lock Previously, it was necessary to comply with the constraints of the kernel in the atomic context.

When entering the driver strategy function, the general block device layer code may access at the same time request_queue . In order to reduce request_queue of queue_lock Lock contention, the block-driven strategy function should exit as soon as possible queue_lock , And then get the lock again before the strategy function returns.

The strategy function is executed asynchronously and is not in the kernel context corresponding to the user mode process. Therefore, the implementation cannot assume that the strategy function runs in the kernel context of the user process.

The strategy function of Sampleblk is sampleblk_request, through blk_init_queue Registered to request_queue of request_fn Member.

Strategy function sampleblk_request The implementation logic is as follows,

  1. Use blk_fetch_request Loop to get each pending in the queue request . Kernel function blk_fetch_request Can return struct request_queue of queue_head First in the queue request Pointer. Then call blk_dequeue_request Remove this from the queue request 。
  2. Every time you get one request , Exit the lock immediately queue_lock , But after processing each request , Need to get again queue_lock 。
  3. REQ_TYPE_FS Used to check whether it is from the file system request . This driver does not support non-file systems request 。
  4. blk_rq_pos Can return request The starting sector number, and blk_rq_bytes Return the whole request The number of bytes should be an integer multiple of the sector.
  5. rq_for_each_segment This macro definition is usedLoop iterationTraverse one request Each Segment in: namely struct bio_vec . Note that each Segment is bio_vec Are all based on blk_rq_pos For the starting sector, the physical sector is continuous. The physical memory between segments is not guaranteed to be continuous.
  6. Every struct bio_vec Both can use kmap to obtain the virtual address of the page where the Segment is located. Use bv_offset with bv_len You can further know the exact page offset and specific length of this segment.
  7. rq_data_dir Can know this request The request is read or write.
  8. After processing the request After that, you must call blk_end_request_all Let the block common layer code do subsequent processing.
Читайте также:  Copy from screen windows

Driving function sampleblk_handle_io Put a request Each segment of the drive level IO operation. Before calling the driver function,Start sector address posLength bv_len , Starting sector virtual memory address kaddr + bvec.bv_offset ,with read/write All are prepared as parameters. Since the Sampleblk driver is only a ramdisk driver, the IO operation of each segment is memcpy To achieve,

4. Test

4.1 Compile and load

First, you need to download the kernel source code, compile and install the kernel, and start with the new kernel.

The specific steps of compiling and installing the kernel are introduced on the Internet, and readers are here to solve them by themselves.

After compiling the kernel, compile the driver module in the kernel directory.

The driver is compiled successfully and the kernel module is loaded

After the driver is loaded successfully, use the crash tool to view struct smapleblk_dev Content,

ffffffffa03bb580 sampleblk 2681 /home/yango/ws/lktm/drivers/block/sampleblk/day1/sampleblk.ko

crash7> p *sampleblk_dev
$4 = <
minor = 1 ,
lock = <
<
rlock = <
raw_lock = <
val = <
counter = 0
>
>
>
>
>,
queue = 0 xffff880034ef9200,
disk = 0 xffff880000887000,
size = 524288 ,
data = 0 xffffc90001a5c000
>

Note: For the use of Linux Crash, please refer to Extended Reading.

4.2 Module reference problem solving

Problem: put the driver sampleblk_request Delete all the functions, recompile and load the kernel module. Then use rmmod to uninstall the module, the uninstallation will fail, and the kernel reports that the module is being used.

Use strace It can be observed /sys/module/sampleblk/refcnt Non-zero, that is, the module is being used.

openat(AT_FDCWD, “/sys/module/sampleblk/holders” , O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
getdents( 3 , /* 2 entries / , 32768 ) = 48
getdents( 3 , / 0 entries / , 32768 ) = 0
close( 3 ) = 0
open( «/sys/module/sampleblk/refcnt» , O_RDONLY|O_CLOEXEC) = 3 / The number of citations displayed is 3 */
read( 3 , “1\n” , 31 ) = 2
read( 3 , “” , 29 ) = 0
close( 3 ) = 0
write( 2 , “rmmod: ERROR: Module sampleblk i” …, 41 rmmod: ERROR: Module sampleblk is in use
) = 41
exit_group( 1 ) = ?
+++ exited with 1 +++

If you use lsmod Command view, you can see that the reference count of the module is indeed 3, but the name of the referer is not displayed. Under normal circumstances, only the mutual references between kernel modules have the name of the referenced module, so there is no name of the referencer, then the referencer comes from a process in the user space.

So, who is using the newly loaded driver, sampleblk? Use module:module_get Tracepoint, you can get the answer. Restart the kernel, before loading the module, run tpoint command. Then, run insmod To load the module.

systemd-udevd- 2986 [ 000 ] … 196.382796 : module_get: sampleblk call_site=get_disk refcnt= 2
systemd-udevd- 2986 [ 000 ] … 196.383071 : module_get: sampleblk call_site=get_disk refcnt= 3

As you can see, the udevd process of systemd is using the sampleblk device. If you are familiar with udevd, you may immediately realize it, because udevd is responsible for listening to the hot plug events of all devices in the system, and is responsible for performing a series of operations on new devices according to predefined rules. and the sampleblk driver is calling add_disk Time, kobject The code of the layer will send hot-pluggable to udevd in user mode uevent , So udevd will open the block device and do related operations. Using the crash command, you can easily find which process is opening the sampleblk device,

Due to sampleblk_request The function implementation is deleted, then udevd The sent IO operation cannot be completed by the sampleblk device driver, so udevd is trapped in a long-term blocking wait until the timeout returns an error and the device is released. The above analysis can be confirmed from the system’s message log,

Note: tpoint It is an open source bash script tool based on ftrace, which can be downloaded and run directly. it is Brendan Gregg For the open source project on github, the link to the project has been given above.

Re-deleted sampleblk_request If the function source code is added back, this problem will not exist. Because udevd can quickly end access to the sampleblk device.

4.3 Create a file system

Although the Sampleblk block driver has only 200 lines of source code, it can already be used as a ramdisk, and a file system can be created on it.

After the file system is successfully created, mount File system and create an empty file a. As you can see, everything can run normally.

So far, the most basic function of sampleblk as a ramdisk has been tested.

Источник

Оцените статью