- linux NBD: Introduction to Linux Network Block Devices
- Getting started with NBD
- ServerSide
- ClientSide
- Important
- Block Device Drivers¶
- Lab objectives¶
- Overview¶
- Register a block I/O device¶
- Register a disk¶
- struct gendisk structure¶
- struct block_device_operations structure¶
- Request Queues — Multi-Queue Block Layer¶
- Software staging queues¶
- Hardware dispatch queues¶
- Tag sets¶
- Create and delete a request queue¶
- Useful functions for processing request queues¶
- Requests for block devices¶
- Create a request¶
- Process a request¶
- struct bio structure¶
- Create a struct bio structure¶
- Submit a struct bio structure¶
- Wait for the completion of a struct bio structure¶
- Initialize a struct bio structure¶
- How to use the content of a struct bio structure¶
- Free a struct bio structure¶
- Set up a request queue at struct bio level¶
linux NBD: Introduction to Linux Network Block Devices
Sep 16, 2019 · 3 min read
Network block devices (NBD) are used to access remote storage device that does not physically reside in the local machine. Using Network Block Device, we can access and use the remote storage devices in following three ways on the local machine:
NBD presents a remote resource as local resource to the client. Also, NBD driver makes a remote resource look like a local device in Linux, allowing a cheap and safe real-time mirror to be constructed.
you dont need to com p are NFS with NBD because both are totally different ways of solutions of network storage system
Why NBD ? usecase scenario
For example, maybe you want to format the device. Or you want to modify or copy entire partitions. These tasks would be impossible to accomplish with a network file system, because they would require you to have the file system unmounted in order to perform them — and if you unmount your network file system, it’s no longer connected.
But if your remote storage device is mounted as a block device (NBD), you can do anything to it that you’d be able to do to a local block device.
In other words, with NBD, you can take a device like /dev/sda on one machine and make it available to another machine as if it were a local device there connected via a SCSI or SATA cable, even though in actuality it is connected over the network.
Sometimes you may want to complete operations on a storage device at a lower level than a network file system(NFS) would support.
You can also boot complete OS from NBD over network.(e.g. scaleway.com users boots everytime from nbd)
Getting started with NBD
I used debian OS for my example below (would works also with all derivates like ubuntu)
NBD works according to a client/server architecture. You use the server to make a volume available as a network block device from a host, then run the client to connect to it from another host.
ServerSide
apt-get install nbd-server
modprobe nbd
after installation you can begin to export a device or file now
Export device
Example: export ServerSide Disk Device like /dev/sda on port 9999
Export img file
Example: export ServceSide .img file like vmdisk.img on port 9998
img over NDB can be useful if you’re working with virtual disk images, for example.
ClientSide
On the client machine that we want to use to connect to the NBD export we just created, we first need to install the NBD client package with:
apt-get install nbd-client
modprobe nbd-client
map/mount remote NBD exported device as local device /dev/nbd0
nbd-client 192.168.1.100 9999 /dev/nbd0
nbd-client 192.168.1.100 9998 /dev/nbd1
What you can do now on ClientSide with mounted NBD?
you can start doing cool things with the NBD export from the client machine by using /dev/nbd0 as the target.
now you are able to use /dev/nbd0 like local disk on clientSide
example: you can format it
mkfs.ext4 /dev/dbd0 )
other usage examples szenarios clientSide
+ You could resize partitions
+ You could create a filesystem (like local filesystem)
+ You could create btrfs /zfs/glusterfs storage pools
As long as the export that the client is using at /dev/nbd0 is mapped to a device like /dev/sda on the server, operations from the client on /dev/nbd0 will take effect on the server just as they would if you were running them locally on the server with /dev/sda as the target.
Important
use this case szenario only in protected Local Networks !
Never use NBD in Public networks (e.g over internet) without configure your security. (the same rules aplies NFS too ! )
All of the above said, NBD is a cool tool. It lets you do things that would otherwise not be possible. It doesn’t get much press these days (which is not surprising because NBD dates all the way back to the early 2000s, and hasn’t ever been important commercially), but it may be just the tool you need to solve some of the strange challenges that can arise in the life of a sysadmin.
Источник
Block Device Drivers¶
Lab objectives¶
- acquiring knowledge about the behavior of the I/O subsystem on Linux
- hands-on activities in structures and functions of block devices
- acquiring basic skills for utilizing the API for block devices, by solving exercises
Overview¶
Block devices are characterized by random access to data organized in fixed-size blocks. Examples of such devices are hard drives, CD-ROM drives, RAM disks, etc. The speed of block devices is generally much higher than the speed of character devices, and their performance is also important. This is why the Linux kernel handles differently these 2 types of devices (it uses a specialized API).
Working with block devices is therefore more complicated than working with character devices. Character devices have a single current position, while block devices must be able to move to any position in the device to provide random access to data. To simplify work with block devices, the Linux kernel provides an entire subsystem called the block I/O (or block layer) subsystem.
From the kernel perspective, the smallest logical unit of addressing is the block. Although the physical device can be addressed at sector level, the kernel performs all disk operations using blocks. Since the smallest unit of physical addressing is the sector, the size of the block must be a multiple of the size of the sector. Additionally, the block size must be a power of 2 and can not exceed the size of a page. The size of the block may vary depending on the file system used, the most common values being 512 bytes, 1 kilobytes and 4 kilobytes.
Register a block I/O device¶
To register a block I/O device, function register_blkdev() is used. To deregister a block I/O device, function unregister_blkdev() is used.
Starting with version 4.9 of the Linux kernel, the call to register_blkdev() is optional. The only operations performed by this function are the dynamic allocation of a major (if the major argument is 0 when calling the function) and creating an entry in /proc/devices . In future kernel versions it may be removed; however, most drivers still call it.
Usually, the call to the register function is performed in the module initialization function, and the call to the deregister function is performed in the module exit function. A typical scenario is presented below:
Register a disk¶
Although the register_blkdev() function obtains a major, it does not provide a device (disk) to the system. For creating and using block devices (disks), a specialized interface defined in linux/genhd.h is used.
The useful functions defined in linux/genhd.h are to register /allocate a disk, add it to the system, and de-register /unmount the disk.
The alloc_disk() function is used to allocate a disk, and the del_gendisk() function is used to deallocate it. Adding the disk to the system is done using the add_disk() function.
The alloc_disk() and add_disk() functions are typically used in the module initialization function, and the del_gendisk() function in the module exit function.
As with character devices, it is recommended to use my_block_dev structure to store important elements describing the block device.
Note that immediately after calling the add_disk() function (actually even during the call), the disk is active and its methods can be called at any time. As a result, this function should not be called before the driver is fully initialized and ready to respond to requests for the registered disk.
It can be noticed that the basic structure in working with block devices (disks) is the struct gendisk structure.
After a call to del_gendisk() , the struct gendisk structure may continue to exist (and the device operations may still be called) if there are still users (an open operation was called on the device but the associated release operation has not been called). One solution is to keep the number of users of the device and call the del_gendisk() function only when there are no users left of the device.
struct gendisk structure¶
The struct gendisk structure stores information about a disk. As stated above, such a structure is obtained from the alloc_disk() call and its fields must be filled before it is sent to the add_disk() function.
The struct gendisk structure has the following important fields:
- major , first_minor , minor , describing the identifiers used by the disk; a disk must have at least one minor; if the disk allows the partitioning operation, a minor must be allocated for each possible partition
- disk_name , which represents the disk name as it appears in /proc/partitions and in sysfs ( /sys/block )
- fops , representing operations associated with the disk
- queue , which represents the queue of requests
- capacity , which is disk capacity in 512 byte sectors; it is initialized using the set_capacity() function
- private_data , which is a pointer to private data
An example of filling a struct gendisk structure is presented below:
As stated before, the kernel considers a disk as a vector of 512 byte sectors. In reality, the devices may have a different size of the sector. To work with these devices, the kernel needs to be informed about the real size of a sector, and for all operations the necessary conversions must be made.
To inform the kernel about the device sector size, a parameter of the request queue must be set just after the request queue is allocated, using the blk_queue_logical_block_size() function. All requests generated by the kernel will be multiple of this sector size and will be aligned accordingly. However, communication between the device and the driver will still be performed in sectors of 512 bytes in size, so conversion should be done each time (an example of such conversion is when calling the set_capacity() function in the code above).
struct block_device_operations structure¶
Just as for a character device, operations in struct file_operations should be completed, so for a block device, the operations in struct block_device_operations should be completed. The association of operations is done through the fops field in the struct gendisk structure.
Some of the fields of the struct block_device_operations structure are presented below:
open() and release() operations are called directly from user space by utilities that may perform the following tasks: partitioning, file system creation, file system verification. In a mount() operation, the open() function is called directly from the kernel space, the file descriptor being stored by the kernel. A driver for a block device can not differentiate between open() calls performed from user space and kernel space.
An example of how to use these two functions is given below:
Please notice that there are no read or write operations. These operations are performed by the request() function associated with the request queue of the disk.
Request Queues — Multi-Queue Block Layer¶
Drivers for block devices use queues to store the block I/O requests that will be processed. A request queue is represented by the struct request_queue structure. The request queue is made up of a double-linked list of requests and their associated control information. The requests are added to the queue by higher-level kernel code (for example, file systems).
The block device driver associates each queue with a handling function, which will be called for each request in the queue (the struct request structure).
In earlier version of the Linux kernel, each device driver had associated one or more request queues ( struct request_queue ), where any client could add requests, while also being able to reorder them. The problem with this approach is that it requires a per-queue lock, making it inefficient in distributed systems.
The Multi-Queue Block Queing Mechanism solves this issue by splitting the device driver queue in two parts:
- Software staging queues
- Hardware dispatch queues
Software staging queues¶
The staging queues hold requests from the clients before sending them to the block device driver. To prevent the waiting for a per-queue lock, a staging queue is allocated for each CPU or node. A software queue is associated to only one hardware queue.
While in this queue, the requests can be merged or reordered, according to an I/O Scheduler, in order to maximize performance. This means that only the requests coming from the same CPU or node can be optimized.
Staging queues are usually not used by the block device drivers, but only internally by the I/O subsystem to optimize requests before sending them to the device drivers.
Hardware dispatch queues¶
The hardware queues ( struct blk_mq_hw_ctx ) are used to send the requests from the staging queues to the block device driver. Once in this queue, the requests can’t be merged or reordered.
Depending on the underlying hardware, a block device driver can create multiple hardware queues in order to improve parallelism and maximize performance.
Tag sets¶
A block device driver can accept a request before the previous one is completed. As a consequence, the upper layers need a way to know when a request is completed. For this, a «tag» is added to each request upon submission and sent back using a completion notification after the request is completed.
The tags are part of a tag set ( struct blk_mq_tag_set ), which is unique to a device. The tag set structure is allocated and initialized before the request queues and also stores some of the queues properties.
Some of the fields in struct blk_mq_tag_set are:
- ops — Queue operations, most notably the request handling function.
- nr_hw_queues — The number of hardware queues allocated for the device
- queue_depth — Hardware queues size
- cmd_size — Number of extra bytes allocated at the end of the device, to be used by the block device driver, if needed.
- numa_node — In NUMA systems, the index of the node the storage device is connected to.
- driver_data — Data private to the driver, if needed.
- tags — Pointer to an array of nr_hw_queues tag sets.
- tag_list — List of request queues using this tag set.
Create and delete a request queue¶
Request queues are created using the blk_mq_init_queue() function and are deleted using blk_cleanup_queue() . The first function creates both the hardware and the software queues and initializes their structures.
Queue properties, including the number of hardware queues, their capacity and request handling function are configured using the blk_mq_tag_set structure, as described above.
An example of using these functions is as follows:
After initializing the tag set structure, the tag lists are allocated using the blk_mq_alloc_tag_set() function. The pointer to the function which will process the requests ( my_block_request() ) is filled in the my_queue_ops structure and then the pointer to this structure is added to the tag set.
The queue is created using the blk_mq_init_queue() function, based on the information added in the tag set.
As part of the request queue initialization, you can configure the queuedata field, which is equivalent to the private_data field in other structures.
Useful functions for processing request queues¶
The queue_rq function from struct blk_mq_ops is used to handle requests for working with the block device. This function is the equivalent of read and write functions encountered on character devices. The function receives the requests for the device as arguments and can use various functions for processing them.
The functions used to process the requests in the handler are described below:
- blk_mq_start_request() — must be called before starting processing a request;
- blk_mq_requeue_request() — to re-send the request in the queue;
- blk_mq_end_request() — to end request processing and notify the upper layers.
Requests for block devices¶
A request for a block device is described by struct request structure.
The fields of struct request structure include:
- cmd_flags : a series of flags including direction (reading or writing); to find out the direction, the macrodefinition rq_data_dir is used, which returns 0 for a read request and 1 for a write request on the device;
- __sector : the first sector of the transfer request; if the device sector has a different size, the appropriate conversion should be done. To access this field, use the blk_rq_pos macro;
- __data_len : the total number of bytes to be transferred; to access this field the blk_rq_bytes macro is used;
- generally, data from the current struct bio will be transferred; the data size is obtained using the blk_rq_cur_bytes macro;
- bio , a dynamic list of struct bio structures that is a set of buffers associated to the request; this field is accessed by macrodefinition rq_for_each_segment if there are multiple buffers, or by bio_data macrodefinition in case there is only one associated buffer;
We will discuss more about the struct bio structure and its associated operations in the bio_structure section.
Create a request¶
Read /write requests are created by code layers superior to the kernel I/O subsystem. Typically, the subsystem that creates requests for block devices is the file management subsystem. The I/O subsystem acts as an interface between the file management subsystem and the block device driver. The main operations under the responsibility of the I/O subsystem are adding requests to the queue of the specific block device and sorting and merging requests according to performance considerations.
Process a request¶
The central part of a block device driver is the request handling function ( queue_rq ). In previous examples, the function that fulfilled this role was my_block_request() . As stated in the Create and delete a request queue section, this function is associated to the driver when creating the tag set structure.
This function is called when the kernel considers that the driver should process I/O requests. The function must start processing the requests from the queue, but it is not mandatory to finish them, as requests may be finished by other parts of the driver.
The request function runs in an atomic context and must follow the rules for atomic code (it does not need to call functions that can cause sleep, etc.).
Calling the function that processes the requests is asynchronous relative to the actions of any userspace process and no assumptions about the process in which the respective function is running should be made. Also, it should not be assumed that the buffer provided by a request is from kernel space or user space, any operation that accesses the userspace being erroneous.
One of the simplest request handling function is presented below:
The my_block_request() function performs the following operations:
- Get a pointer to the request structure from the bd argument and start its processing using the blk_mq_start_request() function.
- A block device can receive calls which do not transfer data blocks (e.g. low level operations on the disk, instructions referring to special ways of accessing the device). Most drivers do not know how to handle these requests and return an error.
- To return an error, blk_mq_end_request() function is called, BLK_STS_IOERR being the second argument.
- The request is processed according to the needs of the associated device.
- The request ends. In this case, blk_mq_end_request() function is called in order to complete the request.
struct bio structure¶
Each struct request structure is an I/O block request, but may come from combining more independent requests from a higher level. The sectors to be transferred for a request can be scattered into the main memory but they always correspond to a set of consecutive sectors on the device. The request is represented as a series of segments, each corresponding to a buffer in memory. The kernel can combine requests that refer to adjacent sectors but will not combine write requests with read requests into a single struct request structure.
A struct request structure is implemented as a linked list of struct bio structures together with information that allows the driver to retain its current position while processing the request.
The struct bio structure is a low-level description of a portion of a block I/O request.
In turn, the struct bio structure contains a bi_io_vec vector of struct bio_vec structures. It consists of the individual pages in the physical memory to be transferred, the offset within the page and the size of the buffer. To iterate through a struct bio structure, we need to iterate through the vector of struct bio_vec and transfer the data from every physical page. To simplify vector iteration, the struct bvec_iter structure is used. This structure maintains information about how many buffers and sectors were consumed during the iteration. The request type is encoded in the bi_opf field; to determine it, use the bio_data_dir() function.
Create a struct bio structure¶
Two functions can be used to create a struct bio structure:
- bio_alloc() : allocates space for a new structure; the structure must be initialized;
- bio_clone() : makes a copy of an existing struct bio structure; the newly obtained structure is initialized with the values of the cloned structure fields; the buffers are shared with the struct bio structure that has been cloned so that access to the buffers has to be done carefully to avoid access to the same memory area from the two clones;
Both functions return a new struct bio structure.
Submit a struct bio structure¶
Usually, a struct bio structure is created by the higher levels of the kernel (usually the file system). A structure thus created is then transmitted to the I/O subsystem that gathers more struct bio structures into a request.
For submitting a struct bio structure to the associated I/O device driver, the submit_bio() function is used. The function receives as argument an initialized struct bio structure that will be added to a request from the request queue of an I/O device. From that queue, it can be processed by the I/O device driver using a specialized function.
Wait for the completion of a struct bio structure¶
Submitting a struct bio structure to a driver has the effect of adding it to a request from the request queue from where it will be further processed. Thus, when the submit_bio() function returns, it is not guaranteed that the processing of the structure has finished. If you want to wait for the processing of the request to be finished, use the submit_bio_wait() function.
To be notified when the processing of a struct bio structure ends (when we do not use submit_bio_wait() function), the bi_end_io field of the structure should be used. This field specifies the function that will be called at the end of the struct bio structure processing. You can use the bi_private field of the structure to pass information to the function.
Initialize a struct bio structure¶
Once a struct bio structure has been allocated and before being transmitted, it must be initialized.
Initializing the structure involves filling in its important fields. As mentioned above, the bi_end_io field is used to specify the function called when the processing of the structure is finished. The bi_private field is used to store useful data that can be accessed in the function pointed by bi_end_io .
The bi_opf field specifies the type of operation.
In the code snippet above we specified the block device to which we sent the following: struct bio structure, startup sector, operation ( REQ_OP_READ or REQ_OP_WRITE ) and content. The content of a struct bio structure is a buffer described by: a physical page, the offset in the page and the size of the bufer. A page can be assigned using the alloc_page() call.
The size field of the bio_add_page() call must be a multiple of the device sector size.
How to use the content of a struct bio structure¶
To use the content of a struct bio structure, the structure’s support pages must be mapped to the kernel address space from where they can be accessed. For mapping /unmapping, use the kmap_atomic and the kunmap_atomic macros.
A typical example of use is:
As it can be seen from the example above, iterating through a struct bio requires iterating through all of its segments. A segment ( struct bio_vec ) is defined by the physical address page, the offset in the page and its size.
To simplify the processing of a struct bio , use the bio_for_each_segment macrodefinition. It will iterate through all segments, and will also update global information stored in an iterator ( struct bvec_iter ) such as the current sector as well as other internal information (segment vector index, number of bytes left to be processed, etc.) .
You can store information in the mapped buffer, or extract information.
In case request queues are used and you needed to process the requests at struct bio level, use the rq_for_each_segment macrodefinition instead of the bio_for_each_segment macrodefinition. This macrodefinition iterates through each segment of each struct bio structure of a struct request structure and updates a struct req_iterator structure. The struct req_iterator contains the current struct bio structure and the iterator that traverses its segments.
A typical example of use is:
Free a struct bio structure¶
Once a kernel subsystem uses a struct bio structure, it will have to release the reference to it. This is done by calling bio_put() function.
Set up a request queue at struct bio level¶
We have previously seen how we can specify a function to be used to process requests sent to the driver. The function receives as argument the requests and carries out processing at struct request level.
If, for flexibility reasons, we need to specify a function that carries out processing at struct bio structure level, we no longer use request queues and we will need to fill the submit_bio field in the struct block_device_operations associated to the driver.
Below is a typical example of initializing a function that carries out processing at struct bio structure level:
Источник