The linux kernel api

This documentation is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA

For more details see the file COPYING in the source distribution of Linux.

Table of Contents

1. Data Types Doubly Linked Lists 2. Basic C Library Functions String Conversions String Manipulation Bit Operations 3. Basic Kernel Library Functions Bitmap Operations Command-line Parsing CRC Functions idr/ida Functions 4. Memory Management in Linux The Slab Cache User Space Memory Access More Memory Management Functions 5. Kernel IPC facilities IPC utilities 6. FIFO Buffer kfifo interface 7. relay interface support relay interface 8. Module Support Module Loading Inter Module support 9. Hardware Interfaces Interrupt Handling DMA Channels Resources Management MTRR Handling PCI Support Library PCI Hotplug Support Library 10. Firmware Interfaces DMI Interfaces EDD Interfaces 11. Security Framework security_init — initializes the security framework security_module_enable — Load given security module on boot ? security_add_hooks — Add a modules hooks to the hook lists. securityfs_create_file — create a file in the securityfs filesystem securityfs_create_dir — create a directory in the securityfs filesystem securityfs_remove — removes a file or directory from the securityfs filesystem 12. Audit Interfaces audit_log_start — obtain an audit buffer audit_log_format — format a message into the audit buffer. audit_log_end — end one audit record audit_log — Log an audit record audit_log_secctx — Converts and logs SELinux context audit_alloc — allocate an audit context block for a task __audit_free — free a per-task audit context __audit_syscall_entry — fill in an audit record at syscall entry __audit_syscall_exit — deallocate audit context after a system call __audit_reusename — fill out filename with info from existing entry __audit_getname — add a name to the list __audit_inode — store the inode and device from a lookup auditsc_get_stamp — get local copies of audit_context values audit_set_loginuid — set current task’s audit_context loginuid __audit_mq_open — record audit data for a POSIX MQ open __audit_mq_sendrecv — record audit data for a POSIX MQ timed send/receive __audit_mq_notify — record audit data for a POSIX MQ notify __audit_mq_getsetattr — record audit data for a POSIX MQ get/set attribute __audit_ipc_obj — record audit data for ipc object __audit_ipc_set_perm — record audit data for new ipc permissions __audit_socketcall — record audit data for sys_socketcall __audit_fd_pair — record audit data for pipe and socketpair __audit_sockaddr — record audit data for sys_bind, sys_connect, sys_sendto audit_signal_info — record signal info for shutting down audit subsystem __audit_log_bprm_fcaps — store information about a loading bprm and relevant fcaps __audit_log_capset — store information about the arguments to the capset syscall audit_core_dumps — record information about processes that end abnormally audit_rule_change — apply all rules to the specified message type audit_list_rules_send — list the audit rules parent_len — find the length of the parent portion of a pathname audit_compare_dname_path — compare given dentry name with last component in given path. Return of 0 indicates a match. 13. Accounting Framework sys_acct — enable/disable process accounting acct_collect — collect accounting information into pacct_struct acct_process — 14. Block Devices blk_delay_queue — restart queueing after defined interval blk_start_queue_async — asynchronously restart a previously stopped queue blk_start_queue — restart a previously stopped queue blk_stop_queue — stop a queue blk_sync_queue — cancel any pending callbacks on a queue __blk_run_queue_uncond — run a queue whether or not it has been stopped __blk_run_queue — run a single device queue blk_run_queue_async — run a single device queue in workqueue context blk_run_queue — run a single device queue blk_queue_bypass_start — enter queue bypass mode blk_queue_bypass_end — leave queue bypass mode blk_cleanup_queue — shutdown a request queue blk_init_queue — prepare a request queue for use with a block device blk_requeue_request — put a request back on queue part_round_stats — Round off the performance stats on a struct disk_stats. generic_make_request — hand a buffer to its device driver for I/O submit_bio — submit a bio to the block device layer for I/O blk_insert_cloned_request — Helper for stacking drivers to submit a request blk_rq_err_bytes — determine number of bytes till the next failure boundary blk_peek_request — peek at the top of a request queue blk_start_request — start request processing on the driver blk_fetch_request — fetch a request from a request queue blk_update_request — Special helper function for request stacking drivers blk_unprep_request — unprepare a request blk_end_request — Helper function for drivers to complete the request. blk_end_request_all — Helper function for drives to finish the request. blk_end_request_cur — Helper function to finish the current request chunk. blk_end_request_err — Finish a request till the next failure boundary. __blk_end_request — Helper function for drivers to complete the request. __blk_end_request_all — Helper function for drives to finish the request. __blk_end_request_cur — Helper function to finish the current request chunk. __blk_end_request_err — Finish a request till the next failure boundary. rq_flush_dcache_pages — Helper function to flush all pages in a request blk_lld_busy — Check if underlying low-level drivers of a device are busy blk_rq_unprep_clone — Helper function to free all bios in a cloned request blk_rq_prep_clone — Helper function to setup clone request blk_start_plug — initialize blk_plug and track it inside the task_struct blk_pm_runtime_init — Block layer runtime PM initialization routine blk_pre_runtime_suspend — Pre runtime suspend check blk_post_runtime_suspend — Post runtime suspend processing blk_pre_runtime_resume — Pre runtime resume processing blk_post_runtime_resume — Post runtime resume processing blk_set_runtime_active — Force runtime status of the queue to be active __blk_drain_queue — drain requests from request_queue __get_request — get a free request get_request — get a free request blk_attempt_plug_merge — try to merge with current ‘s plugged list blk_cloned_rq_check_limits — Helper function to check a cloned request for new the queue limits blk_end_bidi_request — Complete a bidi request __blk_end_bidi_request — Complete a bidi request with queue lock held blk_rq_map_user_iov — map user data to a request, for passthrough requests blk_rq_unmap_user — unmap a request with user data blk_rq_map_kern — map kernel data to a request, for passthrough requests blk_release_queue — release a struct request_queue when it is no longer needed blk_queue_prep_rq — set a prepare_request function for queue blk_queue_unprep_rq — set an unprepare_request function for queue blk_set_default_limits — reset limits to default values blk_set_stacking_limits — set default limits for stacking devices blk_queue_make_request — define an alternate make_request function for a device blk_queue_bounce_limit — set bounce buffer limit for queue blk_queue_max_hw_sectors — set max sectors for a request for this queue blk_queue_chunk_sectors — set size of the chunk for this queue blk_queue_max_discard_sectors — set max sectors for a single discard blk_queue_max_write_same_sectors — set max sectors for a single write same blk_queue_max_write_zeroes_sectors — set max sectors for a single write zeroes blk_queue_max_segments — set max hw segments for a request for this queue blk_queue_max_discard_segments — set max segments for discard requests blk_queue_max_segment_size — set max segment size for blk_rq_map_sg blk_queue_logical_block_size — set logical block size for the queue blk_queue_physical_block_size — set physical block size for the queue blk_queue_alignment_offset — set physical block alignment offset blk_limits_io_min — set minimum request size for a device blk_queue_io_min — set minimum request size for the queue blk_limits_io_opt — set optimal request size for a device blk_queue_io_opt — set optimal request size for the queue blk_queue_stack_limits — inherit underlying queue limits for stacked drivers blk_stack_limits — adjust queue_limits for stacked devices bdev_stack_limits — adjust queue limits for stacked drivers disk_stack_limits — adjust queue limits for stacked drivers blk_queue_dma_pad — set pad mask blk_queue_update_dma_pad — update pad mask blk_queue_dma_drain — Set up a drain buffer for excess dma. blk_queue_segment_boundary — set boundary rules for segment merging blk_queue_virt_boundary — set boundary rules for bio merging blk_queue_dma_alignment — set dma length and memory alignment blk_queue_update_dma_alignment — update dma length and memory alignment blk_set_queue_depth — tell the block layer about the device queue depth blk_queue_write_cache — configure queue’s write cache blk_execute_rq_nowait — insert a request into queue for execution blk_execute_rq — insert a request into queue for execution blkdev_issue_flush — queue a flush blkdev_issue_discard — queue a discard blkdev_issue_write_same — queue a write same operation __blkdev_issue_zeroout — generate number of zero filed write bios blkdev_issue_zeroout — zero-fill a block range blk_queue_find_tag — find a request by its tag and queue blk_free_tags — release a given set of tag maintenance info blk_queue_free_tags — release tag maintenance info blk_init_tags — initialize the tag info for an external tag map blk_queue_init_tags — initialize the queue tag info blk_queue_resize_tags — change the queueing depth blk_queue_end_tag — end tag operations for a request blk_queue_start_tag — find a free tag and assign it blk_queue_invalidate_tags — invalidate all pending tags __blk_queue_free_tags — release tag maintenance info blk_rq_count_integrity_sg — Count number of integrity scatterlist elements blk_rq_map_integrity_sg — Map integrity metadata into a scatterlist blk_integrity_compare — Compare integrity profile of two disks blk_integrity_register — Register a gendisk as being integrity-capable blk_integrity_unregister — Unregister block integrity profile blk_trace_ioctl — handle the ioctls associated with tracing blk_trace_shutdown — stop and cleanup trace structures blk_add_trace_rq — Add a trace for a request oriented action blk_add_trace_bio — Add a trace for a bio oriented action blk_add_trace_bio_remap — Add a trace for a bio-remap operation blk_add_trace_rq_remap — Add a trace for a request-remap operation blk_mangle_minor — scatter minor numbers apart blk_alloc_devt — allocate a dev_t for a partition blk_free_devt — free a dev_t disk_replace_part_tbl — replace disk->part_tbl in RCU-safe way disk_expand_part_tbl — expand disk->part_tbl disk_block_events — block and flush disk event checking disk_unblock_events — unblock disk event checking disk_flush_events — schedule immediate event checking and flushing disk_clear_events — synchronously check, clear and return pending events disk_get_part — get partition disk_part_iter_init — initialize partition iterator disk_part_iter_next — proceed iterator to the next partition and return it disk_part_iter_exit — finish up partition iteration disk_map_sector_rcu — map sector to partition register_blkdev — register a new block device device_add_disk — add partitioning information to kernel list get_gendisk — get partitioning information for a given device bdget_disk — do bdget by gendisk and partition number 15. Char devices register_chrdev_region — register a range of device numbers alloc_chrdev_region — register a range of char device numbers __register_chrdev — create and register a cdev occupying a range of minors unregister_chrdev_region — unregister a range of device numbers __unregister_chrdev — unregister and destroy a cdev cdev_add — add a char device to the system cdev_del — remove a cdev from the system cdev_alloc — allocate a cdev structure cdev_init — initialize a cdev structure 16. Miscellaneous Devices misc_register — register a miscellaneous device misc_deregister — unregister a miscellaneous device 17. Clock Framework struct clk_notifier — associate a clk with a notifier struct clk_notifier_data — rate data to pass to the notifier callback clk_notifier_register — change notifier callback clk_notifier_unregister — change notifier callback clk_get_accuracy — obtain the clock accuracy in ppb (parts per billion) for a clock source. clk_set_phase — adjust the phase shift of a clock signal clk_get_phase — return the phase shift of a clock signal clk_is_match — check if two clk’s point to the same hardware clock clk_prepare — prepare a clock source clk_unprepare — undo preparation of a clock source clk_get — lookup and obtain a reference to a clock producer. devm_clk_get — lookup and obtain a managed reference to a clock producer. devm_get_clk_from_child — lookup and obtain a managed reference to a clock producer from child node. clk_enable — inform the system when the clock source should be running. clk_disable — inform the system when the clock source is no longer required. clk_get_rate — obtain the current clock rate (in Hz) for a clock source. This is only valid once the clock source has been enabled. clk_put — «free» the clock source devm_clk_put — «free» a managed clock source clk_round_rate — adjust a rate to the exact rate a clock can provide clk_set_rate — set the clock rate for a clock source clk_has_parent — check if a clock is a possible parent for another clk_set_rate_range — set a rate range for a clock source clk_set_min_rate — set a minimum clock rate for a clock source clk_set_max_rate — set a maximum clock rate for a clock source clk_set_parent — set the parent clock source for this clock clk_get_parent — get the parent clock source for this clock clk_get_sys — get a clock based upon the device name

Источник

Kernel API¶

Lab objectives¶

Familiarize yourself with the basic Linux kernel API
Description of memory allocation mechanisms
Description of locking mechanisms

Overview¶

Inside the current lab we present a set of concepts and basic functions required for starting Linux kernel programming. It is important to note that kernel programming differs greatly from user space programming. The kernel is a stand-alone entity that can not use libraries in user-space (not even libc). As a result, the usual user-space functions (printf, malloc, free, open, read, write, memcpy, strcpy, etc.) can no longer be used. In conclusion, kernel programming is based on a totally new and independent API that is unrelated to the user-space API, whether we refer to POSIX or ANSI C (standard C language library functions).

Accessing memory¶

An important difference in kernel programming is how to access and allocate memory. Due to the fact that kernel programming is very close to the physical machine, there are important rules for memory management. First, it works with several types of memory:

Physical memory
Virtual memory from the kernel address space
Virtual memory from a process’s address space
Resident memory — we know for sure that the accessed pages are present in physical memory

Virtual memory in a process’s address space can not be considered resident due to the virtual memory mechanisms implemented by the operating system: pages may be swapped or simply may not be present in physical memory as a result of the demand paging mechanism. The memory in the kernel address space can be resident or not. Both the data and code segments of a module and the kernel stack of a process are resident. Dynamic memory may or may not be resident, depending on how it is allocated.

When working with resident memory, things are simple: memory can be accessed at any time. But if working with non-resident memory, then it can only be accessed from certain contexts. Non-resident memory can only be accessed from the process context. Accessing non-resident memory from the context of an interrupt has unpredictable results and, therefore, when the operating system detects such access, it will take drastic measures: blocking or resetting the system to prevent serious corruption.

The virtual memory of a process can not be accessed directly from the kernel. In general, it is totally discouraged to access the address space of a process, but there are situations where a device driver needs to do it. The typical case is where the device driver needs to access a buffer from the user-space. In this case, the device driver must use special features and not directly access the buffer. This is necessary to prevent access to invalid memory areas.

Another difference from the user-space scheduling, relative to memory, is due to the stack, a stack whose size is fixed and limited. A stack of 4K is used in Linux, and a stack of 12K is used in Windows. For this reason, the allocation of large structures on stack or the use of recursive calls should be avoided.

Contexts of execution¶

In relation to kernel execution, we distinguish two contexts: process context and interrupt context. We are in the process context when we run code as a result of a system call or when we run in the context of a kernel thread. When we run in a routine to handle an interrupt or a deferrable action, we run in an interrupt context.

Some of the kernel API calls can block the current process. Common examples are using a semaphore or waiting for a condition. In this case, the process is put into the WAITING state and another process is running. An interesting situation occurs when a function that can lead to the current process to be suspended, is called from an interrupt context. In this case, there is no current process, and therefore the results are unpredictable. Whenever the operating system detects this condition will generate an error condition that will cause the operating system to shut down.

Locking¶

One of the most important features of kernel programming is parallelism. Linux supports SMP systems with multiple processors and kernel preemptivity. This makes kernel programming more difficult because access to global variables must be synchronized with either spinlock primitives or blocking primitives. Although it is recommended to use blocking primitives, they can not be used in an interrupt context, so the only locking solution in the context of an interrupt is spinlocks.

Spinlocks are used in order to achieve mutual exclusion. When it can not get access to the critical region, it does not suspend the current process, but it uses the busy-waiting mechanism (waiting in a while() loop for the lock to be released). The code that runs in the critical region protected by a spinlock is not allowed to suspend the current process (it must adhere to the execution conditions in the interrupt context). Moreover, the CPU will not be released except for the case of an interrupt. Due to the mechanism used, it is important that a spinlock is being held as little time as possible.

Читайте также: Arch linux kde iso

Preemptivity¶

Linux uses preemptive kernels. The notion of preemptive multitasking should not be confused with the notion of a preemptive kernel. The notion of preemptive multitasking refers to the fact that the operating system forcefully interrupts a process running in user space when its quantum (time slice) expires, in order to run another process. A kernel is preemptive if a process running in kernel mode (as a result of a system call) can be interrupted so that another process is being run.

Because of preemptivity, when we share resources between two portions of code that can run from different process contexts, we need to protect ourselves with synchronization primitives, even in the case of a single processor.

Linux Kernel API¶

Convention indicating errors¶

For Linux kernel programming, the convention used for calling functions to indicate success is the same as in UNIX programming: 0 for success, or a value other than 0 for failure. For failures, negative values are returned as shown in the example below:

The exhaustive list of errors and a summary explanation can be found in include/asm-generic/errno-base.h and in includes/asm-generic/ernno.h .

Strings of characters¶

In Linux, the kernel programmer is provided with the usual routine functions: strcpy() , strncpy() , strlcpy() , strcat() , strncat() , strlcat() , strcmp() , strncmp() , strnicmp() , strchr() , strnchr() , strrchr() , strstr() , strlen() , memset() , memmove() , memcmp() , etc. These functions are declared in the include/linux/string.h header and are implemented in the kernel in the lib/string.c file.

printk¶

The printf equivalent in the kernel is printk, defined in include/linux/printk.h . The printk() syntax is very similar to printf() . The first parameter of printk() decides the log category in which the current log falls into:

Thus, a warning message in the kernel would be sent with:

If the logging level is missing from the printk() call, logging is done with the default level at the time of the call. One thing to keep in mind is that messages sent with printk() are only visible on the console if and only if their level exceeds the default level set on the console.

To reduce the size of lines when using printk() , it is recommended to use the following help functions instead of directly using the printk() call:

A special case is pr_debug() that calls the printk() function only when the DEBUG macro is defined or if dynamic debugging is used.

Memory allocation¶

In Linux only resident memory can be allocated, using kmalloc() call. A typical kmalloc() call is presented below:

As you can see, the first parameter indicates the size in bytes of the allocated area. The function returns a pointer to a memory area that can be directly used in the kernel, or NULL if memory could not be allocated. The second parameter specifies how allocation should be done and the most commonly used values for this are:

GFP_KERNEL — using this value may cause the current process to be suspended. Thus, it can not be used in the interrupt context.
GFP_ATOMIC — using this value it ensures that the kmalloc() function does not suspend the current process. It can be used anytime.

The counterpart to the kmalloc() function is kfree() , a function that receives as argument an area allocated by kmalloc() . This function does not suspend the current process and can therefore be called from any context.

lists¶

Because linked lists are often used, the Linux kernel API provides a unified way of defining and using lists. This involves using a struct list_head element in the structure we want to consider as a list node. The struct list_head is defined in include/linux/list.h along with all the other functions that manipulate the lists. The following code shows the definition of the struct list_head and the use of an element of this type in another well-known structure in the Linux kernel:

The usual routines for working with lists are the following:

LIST_HEAD(name) is used to declare the sentinel of a list
INIT_LIST_HEAD(struct list_head *list)() is used to initialize the sentinel of a list when dynamic allocation is made, by setting the value of the next and prev to list fields.
list_add(struct list_head *new, struct list_head *head)() adds the new element after the head element.
list_del(struct list_head *entry)() deletes the item at the entry address of the list it belongs to.
list_entry(ptr, type, member) returns the structure with the type type that contains the element ptr from the list, having the name member within the structure.
list_for_each(pos, head) iterates over a list using pos as a cursor.
list_for_each_safe(pos, n, head) iterates over a list using pos as a cursor and n as a temporary cursor. This macro is used to delete an item from the list.

The following code shows how to use these routines:

The evolution of the list can be seen in the following figure:

You see the stack type behavior introduced by the list_add macro, and the use of a sentinel.

From the above example, it can be noticed that the way to define and use a list (double-linked) is generic and, at the same time, it does not introduce an additional overhead. The struct list_head is used to maintain the links between the list elements. It can be noticed that iterating over the list is also done with this structure, and that retrieving a list element can be done using list_entry . This idea of implementing and using a list is not new, as it has already been described in The Art of Computer Programming by Donald Knuth in the 1980s.

Several kernel list functions and macro definitions are presented and explained in the include/linux/list.h header.

Spinlock¶

spinlock_t (defined in linux/spinlock.h ) is the basic type that implements the spinlock concept in Linux. It describes a spinlock, and the operations associated with a spinlock are spin_lock_init() , spin_lock() , spin_unlock() . An example of use is given below:

In Linux, you can use reader-writer spinlocks, useful for readers-writers problems. These types of locks are identified by rwlock_t , and the functions that can work on a reader-writer spinlock are rwlock_init() , read_lock() , write_lock() . An example of use:

mutex¶

A mutex is a variable of the struct mutex type (defined in linux/mutex.h ). Functions and macros for working with mutexes are listed below:

Operations are similar to classic mutex operations in user-space or spinlock operations: the mutex is acquired before entering the critical region and it is released after exiting the critical region. Unlike spinlocks, these operations can only be used in process context.

Atomic variables¶

Often, you only need to synchronize access to a simple variable, such as a counter. For this, an atomic_t type can be used (defined in include/linux/atomic.h ), that holds an integer value. Below are some operations that can be performed on an atomic_t variable.

Use of atomic variables¶

A common way of using atomic variables is to store the status of an action (e.g. a flag). So we can use an atomic variable to mark exclusive actions. For example, we consider that an atomic variable can have the LOCKED and UNLOCKED values, and if the respective variable equals LOCKED then a specific function should return -EBUSY. Such an usage is shown schematically in the code below:

The above code is the equivalent of using a trylock (such as pthread_mutex_trylock() ).

We can also use a variable to store the size of a buffer and for atomic updates of the respective variable. The code below is such an example:

Atomic bitwise operations¶

The kernel provides a set of functions (in asm/bitops.h ) that modify or test bits in an atomic way.

Addr represents the address of the memory area whose bits are being modified or tested and nr is the bit on which the operation is performed.

Exercises¶

To solve exercises, you need to perform these steps:

prepare skeletons from templates
build modules
copy modules to the VM
start the VM and test the module in the VM.

The current lab name is kernel_api. See the exercises for the task name.

The skeleton code is generated from full source examples located in tools/labs/templates . To solve the tasks, start by generating the skeleton code for a complete lab:

You can also generate the skeleton for a single task, using

Once the skeleton drivers are generated, build the source:

Then, copy the modules and start the VM:

The modules are placed in /home/root/skels/kernel_api/ .

Alternatively, we can copy files via scp, in order to avoid restarting the VM. For additional details about connecting to the VM via the network, please check Connecting to the Virtual Machine .

Review the Exercises section for more detailed information.

Before starting the exercises or generating the skeletons, please run git pull inside the Linux repo, to make sure you have the latest version of the exercises.

If you have local changes, the pull command will fail. Check for local changes using git status . If you want to keep them, run git stash before pull and git stash pop after. To discard the changes, run git reset —hard master .

If you already generated the skeleton before git pull you will need to generate it again.

0. Intro¶

Using LXR find the definitions of the following symbols in the Linux kernel:

struct list_head
INIT_LIST_HEAD()
list_add()
list_for_each
list_entry
container_of
offsetof

1. Memory allocation in Linux kernel¶

Generate the skeleton for the task named 1-mem and browse the contents of the mem.c file. Observe the use of kmalloc() call for memory allocation.

Compile the source code and load the mem.ko module using insmod.
View the kernel messages using the dmesg command.
Unload the kernel module using the rmmod mem command.

Review the Memory Allocation section in the lab.

2. Sleeping in atomic context¶

Generate the skeleton for the task named 2-sched-spin and browse the contents of the sched-spin.c file.

Compile the source code and load the module, according the above info: ( make build and make copy)
Notice that it is waiting for 5 seconds until the insertion order is complete.
Unload the kernel module.
Look for the lines marked with: TODO 0 to create an atomic section. Re-compile the source code and reload the module into the kernel.

You should now get an error. Look at the stack trace. What is the cause of the error?

In the error message, follow the line containing the BUG for a description of the error. You are not allowed to sleep in atomic context. The atomic context is given by a section between a lock operation and an unlock on a spinlock.

The schedule_timeout() function, corroborated with the set_current_state macro, forces the current process to wait for 5 seconds.

3. Working with kernel memory¶

Generate the skeleton for the task named 3-memory directory and browse the contents of the memory.c file. Notice the comments marked with TODO . You must allocate 4 structures of type struct task_info and initialize them (in memory_init() ), then print and free them (in memory_exit() ).

(TODO 1) Allocate memory for struct task_info structure and initialize its fields:

The pid field to the PID transmitted as a parameter;
The timestamp field to the value of the jiffies variable, which holds the number of ticks that have occurred since the system booted.

(TODO 2) Allocate struct task_info for the current process, the parent process, the next process, the next process of the next process, with the following information:

PID of the current process, which can be retrieved from struct task_struct structure, returned by current macro.

Search for pid in task_struct .

PID of the parent process of the current process.

Search for the relevant field from struct task_struct structure. Look after «parent».

PID of the next process from the list of processes, relative to the current process.

Use next_task macro, which returns a pointer to the next process (i.e a struct task_struct structure).

PID of the next process of the next process, relative to the current process.

Call the next_task macro 2 times.

(TODO 3) Display the four structures.

Use printk() to display their two fields:

pid and timestamp .

(TODO 4) Release the memory occupied by the structures (use kfree() ).

You can access the current process using current macro.
Look for the relevant fields in the struct task_struct structure ( pid , parent ).
Use the next_task macro. The macro returns the pointer to the next process (ie. a struct task_struct* structure).

The struct task_struct structure contains two fields to designate the parent of a task:

real_parent points to the process that created the task or to process with pid 1 (init) if the parent completed its execution.
parent indicates to the current task parent (the process that will be reported if the task completes execution).

In general, the values of the two fields are the same, but there are situations where they differ, for example when using the ptrace() system call.

Review the Memory allocation section in the lab.

4. Working with kernel lists¶

Generate the skeleton for the task named 4-list. Browse the contents of the list.c file and notice the comments marked with TODO . The current process will add the four structures from the previous exercise into a list. The list will be built in the task_info_add_for_current() function which is called when module is loaded. The list will be printed and deleted in the list_exit() function and the task_info_purge_list() function.

(TODO 1) Complete the task_info_add_to_list() function to allocate a struct task_info structure and add it to the list.
(TODO 2) Complete the task_info_purge_list() function to delete all the elements in the list.
Compile the kernel module. Load and unload the module by following the messages displayed by the kernel.

Review the labs Lists section. When deleting items from the list, you will need to use either the list_for_each_safe or list_for_each_entry_safe macros.

5. Working with kernel lists for process handling¶

Generate the skeleton for the task named 5-list-full. Browse the contents of the list-full.c and notice comments marked with TODO . In addition to the 4-list functionality we add the following:

A count field showing how many times a process has been «added» to the list.

If a process is «added» several times, no new entry is created in the list, but:

Update the timestamp field.
Increment count .

To implement the counter facility, add a task_info_find_pid() function that searches for a pid in the existing list.

If found, return the reference to the task_info struct. If not, return NULL .

An expiration facility. If a process was added more than 3 seconds ago and if it does not have a count greater than 5 then it is considered expired and is removed from the list.

The expiration facility is already implemented in the task_info_remove_expired() function.

(TODO 1) Implement the task_info_find_pid() function.

(TODO 2) Change a field of an item in the list so it does not expire. It must not satisfy a part of the expiration condition from task_info_remove_expired() .

For TODO 2 , extract the first element from the list (the one referred by head.next ) and set the count field to a large enough value. Use atomic_set() function.

Compile, copy, load and unload the kernel module following the displayed messages. Kernel module loading will take some time, because sleep() is being called by schedule_timeout() function.

6. Synchronizing list work¶

Generate the skeleton for the task named 6-list-sync.

Browse the code and look for TODO 1 string.
Use a spinlock or a read-write lock to synchronize access to the list.
Compile, load and unload the kernel module.

Always lock data, not code!

Read Spinlock section of the lab.

7. Test module calling in our list module¶

Generate the skeleton for the task named 7-list-test and browse the contents of the list-test.c file. We’ll use it as a test module. It will call functions exported by the 6-list-sync task. The exported functions are the ones marked with extern in list-test.c file.

Uncomment the commented code from 7-list-test.c . Look for TODO 1 .

To export the above functions from the module located at 6-list-sync/ directory, the following steps are required:

Functions must not be static.
Use the EXPORT_SYMBOL macro to export the kernel symbols. For example: EXPORT_SYMBOL(task_info_remove_expired); . The macro must be used for each function after the function is defined. Browse the code and look for the TODO 2 string in the list-sync.c .
Remove from the module from 6-list-sync the code that avoids the expiration of a list item (it is in contradiction to our exercise).
Compile and load the module from 6-list-sync/ . Once loaded, it exposes exported functions and can be used by the test module. You can check this by searching for the function names in /proc/kallsyms before and after loading the module.
Compile the test module and then load it.
Use lsmod to check that the two modules have been loaded. What do you notice?
Unload the kernel test module.

What should be the unload order of the two modules (the module from 6-list-sync and the test module)? What happens if you use another order?

Источник

The linux kernel api