How linux works what every superuser should know 2nd edition

How Linux Works What Every Superuser Should Know

Table of contents:

Chapter 1: The big picture

Levels and layers of abstraction in a linux system

A layer or level is a classification (or grouping) of a component according to where that component sits between the user and the hardware. A Linux system has three main levels: Hardware, kernel and processes which makes collectively the user space. The kernel runs in kernel mode, which has unrestricted access to the processor and main memory. User processes run in user mode, which restricts access to a (usually quite small) subset of memory and safe CPU operations.

Hardware: understanding main memory

Running kernel and processes reside in memory (most important part of hardware), a CPU is just an operator on memory. State in reference to memory, is a particular arrangement of bits.

One of the kernel’s tasks is to split memory into many subdivisions, and it must maintain certain state information about those subdivisions at all times. The kernel is in charge of managing tasks in four general system areas:

Process management describes the starting, pausing, resuming, and terminating of processes. The act of one process giving up control of the CPU to another process is called a context switch (kernel is responsible for this).

The kernel must manage memory during a context switch, for this CPUs include a memory management unit (MMU) which acts like a proxy between process memory and physical location of the memory itself (The implementation of a memory address map is called a page table).

Device Drivers and Management

A device is typically accessible only in kernel mode because improper access could crash the machine and the impedance between devices programming interfaces.

System Calls and Support

system calls (or syscalls) perform specific tasks that a user process alone cannot do it or do well. Two important ones:

Other than init, all user processes on a Linux system start as a result of fork(), and most of the time, you also run exec() to start a new program instead of running a copy of an existing process.

The main memory that the kernel allocates for user processes is called user space. Because a process is simply a state (or image) in memory, user space also refers to the memory for the entire collection of running processes.

A user is an entity that can run processes and own files. A user is associated with a username, the kernel does not manage the usernames, instead, it identifies users by simple numeric identifiers called userids. Users exist primarily to support permissions and boundaries. Every user-space process has a user owner, and processes are said to run as the owner. Groups are sets of users. The primary purpose of them is to allow a user to share file access to other users in a group.

Chapter 2: Basic Commands and Directory Hierarchy

The Bourne shell: /bin/sh

Every Unix system needs the Bourne shell in order to function correctly. The bash shell is the default shell on most Linux distributions, and /bin/sh is normally a link to bash on a Linux system. When you install Linux, you should create at least one regular user in addition to the root user.

Using the Shell

The Shell Window

Standard Input and Standard Output

cat outputs the content of one or more files. Unix processes use I/O streams to read and write data, the source of an input stream can be a file, a device, a terminal, or even the output stream from another process. Pressing ctrl-D on an empty line stops the current standard input entry from the terminal, ctrl-C terminates the process. Standard output is similar. The kernel gives each process a standard output stream where it can write its output.

Some basic commands:

Some essential directory commands:

Shell Globbing (Wildcards)

The shell can match simple patterns to file and directory names, a process known as globbing (i.e. * matches any number of arbitrary characters). The shell matches arguments containing globs to filenames, substitutes the filenames for those arguments, and then runs the revised command line, this is called expansion. The shell glob character ‘?’ instructs the shell to match exactly one arbitrary character. If you don’t want the shell to expand a glob in a command, enclose the glob in single quotes.

Other essential intermediate Unix commands:

Changing your Password and shell

Use the passwd command to change your password, which will ask you for the old password before allowing you to enter the new one.

Files that start with a dot are not shown by default by some programs like ls (unless you run it with the -a option). Shell globs don’t match dot files unless you explicitly use a pattern such as .* .

Environment and shell variables

To assign a value to a shell variable, use the equal sign (=) like in TEST=’This is a test’ , access this variable with a dollar sign like in echo $TEST . To create an environmental variable (not specific to any shell) use the export command like in export TEST=’This is a test’

The Command Path

PATH is an environmental variable that contains a list of system directories that the shell searches when trying to locate a command.

-> Negation, directory shortcut * # -> Comments, preprocessor, substitutions * [ ] -> Ranges * < >-> Statement blocks, ranges * _ -> Cheap substitute for a space «>

There are two standard de facto for text editors in Unix: vi (faster) and emacs (can do almost anything requires some extra typing to get these features).

Getting online Help

For basic commands, the manual pages (or man pages) will tell you everything: man ls . To search for a manual page by keyword, use the -k option. Manual pages are referenced by numbered sections and you can select a manual page by section like in man 5 passwd . There is another format called info info command . Some packages dump their available documentation into /usr/share/doc.

Shell Input and Output

To send the output of command to a file instead of the terminal, use the > redirection character. The shell creates file if it does not already exist. If file exists, the shell erases (clobbers) the original file first. In bash, set -C to avoid clobbering. Use >> to avoid overwriting the file. To send the standard output of a command to the standard input of another command, use the pipe character | .

The standard error (stderr); it’s an additional output stream for diagnostics and debugging. You can redirect the standard error using the 2> syntax (2 is the stream ID that the shell modifies), like this: ls /fffffffff > f 2> e . You can also send the standard error to the same place as stdout with the >& notation like in ls /fffffffff > f 2>&1 .

Standard Input Redirection

To channel a file to a program’s standard input, use the operator: head

Understanding Error Messages

Unlike messages from other operating sys- tems, Unix errors usually tell you exactly what went wrong.

Anatomy of a UNIX Error Message

A UNIX error message usually consist of several parts like the command that was erroneous, the file it was acting upon, and the error message. When troubleshooting errors, always address the first error first.

A list of the most common error messages:

Listing and Manipulating Processes

For a quick listing of running processes, just run ps on the command line. This displays the following fields:

The ps command has some useful options:

To check on a specific process, add its PID to the argument list of the ps command. To inspect the current shell process, you could use ps u $$ , because $$ is a shell variable that evaluates to the current shell’s PID.

To terminate a process, send it a signal with the kill command. When you run kill, you’re asking the kernel to send a signal to another process A signal is a message to a process from the kernel. There are many types of signals. The default is TERM, or terminate. You can pass options, for example to freeze a process: kill -STOP pid . To continue with the previously stopped process use CONT: kill -CONT pid

Shells also support job control, using various keystrokes and commands. You can send a TSTP signal (similar to STOP) with ctrl-Z, then start the process again by entering fg (bring to foreground) or bg (move to background).

When you run a Unix command from the shell, you don’t get the shell prompt back until the program finishes executing. To detach a process from the shell and put it in the background use & : gunzip file.gz & . The process will continue to run after you log out. If a program tries to read something from the standard input when it’s in the background, it can freeze. The bash shell and most full-screen interactive programs support ctRl-l to redraw the entire screen.

file modes and Permissions

Every Unix file has a set of permissions that determine whether you can read, write, or run the file. The first character of the mode is the file type. A dash (-) in this position, denotes a regular file and a (d), denotes a directory. The rest of a file’s mode contains the permissions, which break down into three sets: user, group, and other. Some executable files have an s in the user permissions listing instead of an x. This indicates that the executable is setuid, meaning that when you execute the program, it runs as though the file owner is the user instead of you.

To change permissions, use the chmod command. The command is like this: chmod (target)[+-](permissions) where target is g for group, o for others u for user. You can do this with numbers too, for example with the number 644 you set the following permissions user: read/write; group, other: read You can specify a set of default permissions with the umask shell command, which applies a predefined set of permissions to any new file you create (most common is 022).

A symbolic link is a file that points to another file or a directory, effectively creating an alias (names that point to other names). Denoted by the ‘->’ if the ls command is executed.

Creating Symbolic Links

To create a symbolic link from target to linkname, use ln -s target linkname .

Archiving and Compressing Files

One of the current standard Unix compression programs. A file that ends with .gz is a GNU Zip archive. Gzip doesn’t pack multiple files and directories into one file.

To create an archive file execute tar cvf archive.tar file1 file2 . . To unpack a .tar file with tar use the x flag tar xvf archive.tar . You can list the contents of a file before untar-ing it with the ‘t’ option. Use ‘p’ to preserve permissions.

Compressed Archives (.tar.gz)

To unpack a compressed archive, work from the right side to the left; get rid of the .gz first and then worry about the .tar (and inverse to compress).

The previous method isn’t the fastest or most efficient way to invoke tar on a compressed archive, and it wastes disk space and kernel I/O time. A better way is to combine archival and compression functions with a pipeline: zcat file.tar.gz | tar xvf — . The zcat command is the same as gunzip -dc. The -d option decom- presses and the -c option sends the result to standard output. The option z in tar is a shortcut for the zcat command: tar ztvf f.tar.gz .

Other Compression Utilities

There are other compression utilities like the bzip2. The bzip2 compression/decompression option for tar is j.

Linux Directory Hierarchy Essentials

Here are the most important subdirectories in root:

Other Root Subdirectories

There are a few other interesting subdirectories in the root directory:

The /usr Directory

In addition to /usr/bin, /usr/sbin, and /usr/lib, /usr contains the following:

On Linux systems, the kernel is normally in /vmlinuz or /boot/vmlinuz. A boot loader loads this file into memory and sets it in motion when the system boots. You’ll find many modules that the kernel can load and unload on demand during the course of normal system operation (Called loadable kernel modules).

Running Commands as the Superuser

You can run the su command and enter the root password to start a root shell. This practice works, although: * You have no record of system-altering commands * You have no record of the users who performed system-altering commands * You don’t have access to your normal shell environment * You have to enter the root password

sudo allows administrators to run commands as root when they are logged in as themselves. When you run this command, sudo logs this action with the syslog service under the local2 facility.

To run sudo, you must configure the privileged users in your /etc/sudoers file. An example of this file is below:

The first line defines an ADMINS user alias with the two users, and the second line grants the privileges. The ALL = NOPASSWD: ALL part means that the users in the ADMINS alias can use sudo to execute commands as root. The second ALL means any command. The first ALL means any host. The root ALL=(ALL) ALL means that the superuser may also use sudo to run any command on any host. The extra (ALL) means that the superuser may also run commands as any other user. You can extend this privilege to the ADMINS users by adding (ALL) to the /etc/sudoers line: ADMINS ALL = (ALL) NOPASSWD: ALL

Chapter 3: Devices

Most devices on a Unix system because the kernel presents many of the device I/O interfaces to user processes as files. Some devices are also accessible to standard programs like cat , although not all devices or device capabilities are accessible with standard file I/O. If you list the contents of /dev, the first letter displayed would tell you the type of device:

The sysfs Device Path

The kernel assigns devices in the order in which they are found, so a device may have a different name between reboots. The Linux kernel offers the sysfs interface through a system of files and directories to provide a uniform view for attached devices based on their actual hardware attributes. The base path for devices is /sys/devices. There are a few shortcuts in the /sys directory. /sys/block should contain all of the block devices available on a system. To find the sysfs location of a device in /dev. Use the udevadm command: udevadm info —query=all —name=/dev/sda

The program dd is extremely useful when working with block and character devices, its function is to read from an input file or stream and write to an output file or stream (dd copies data in blocks of a fixed size). Usage: dd if=/dev/zero of=new_file bs=1024 count=1

device name summary

It can sometimes be difficult to find the name of a device. Some common Linux devices and their naming conventions are:

Hard Disks: /dev/sd*

These devices represent entire disks; the kernel makes separate device files, such as /dev/sda1 and /dev/sda2, for the partitions on a disk. The sd portion of the name stands for SCSI disk (Small Computer System Interface). To list the SCSI devices on your system, use a utility that walks the device paths provided by sysfs. Most modern Linux systems use the Universally Unique Identifier (UUID) for persistent disk device access.

CD and DVD Drives: /dev/sr*

The /dev/sr* devices are read only, and they are used only for reading from discs.

PATA Hard Disks: /dev/hd*

The Linux block devices /dev/hd* are common on older versions of the Linux kernel and with older hardware.

Terminals: /dev/tty*, /dev/pts/*, and /dev/tty

Terminals are devices for moving characters between a user process and an I/O device, usually for text output to a terminal screen. Pseudoterminal devices are emulated terminals that understand the I/O features of real terminals. Two common terminal devices are /dev/tty1 (the first virtual console) and /dev/pts/0 (the first pseudoterminal device). The /dev/tty device is the controlling terminal of the current process. If a program is currently reading and writing to a terminal, this device is a synonym for that terminal. A process does not need to be attached to a terminal. Linux has two primary display modes: text mode and an X Window System server (graphics mode). You can switch between the different virtual environments with the ctrl-alt-Function keys.

Parallel Ports: /dev/lp0 and /dev/lp1

Representing an interface type that has largely been replaced by USB. You can send files (such as a file to be printed) directly to a parallel port with the cat command.

Audio Devices: /dev/snd/*, /dev/dsp, /dev/audio, and More

Linux has two sets of audio devices. There are separate devices for the Advanced Linux Sound Architecture (ALSA in /dev/snd) system interface and the older Open Sound System (OSS).

Creating Device Files

The mknod command creates one device (deprecated). You must know the device name as well as its major and minor numbers.

The Linux kernel can send notifications to a user-space process (named udevd) upon detecting a new device on the system. The user-space process on the other end examines the new device’s characteristics, creates a device file, and then performs any device initialization.

The devtmpfs filesystem was developed in response to the problem of device availability during boot. This filesystem is similar to the older devfs support, but it’s simplified. The kernel creates device files as necessary, but it also notifies udevd that a new device is available.

udevd Operation and Configuration

The udevd daemon operates as follows:

The udevadm program is an administration tool for udevd. You can reload udevd rules and trigger events, but perhaps the most powerful features of udevadm are the ability to search for and explore system devices and the ability to monitor uevents as udevd receives them from the kernel.

To monitor uevents with udevadm, use the monitor command: udevadm monitor

in-depth: scsi and the linux kernel

The traditional SCSI hardware setup is a host adapter linked with a chain of devices over an SCSI bus. This adapter is attached to a computer, the host adapter and devices each have an SCSI ID and there can be 8 or 16 IDs per bus, depending on the SCSI version. The host adapter communicates with the devices through the SCSI command set in a peer-to-peer relationship; the devices send responses back to the host adapter. The SCSI subsystem and its three layers of drivers can be described as:

USB Storage and SCSI

USB is quite similar to SCSI—it has device classes, buses, and host controllers. The Linux kernel includes a three-layer USB subsystem that closely resembles the SCSI subsystem, with device-class drivers at the top, a bus management core in the middle, and host controller drivers at the bottom.

To connect the SATA-specific drivers of the kernel to the SCSI subsystem, the kernel employs a bridge driver, as with the USB drives. The optical drive speaks ATAPI, a version of SCSI commands encoded in the ATA protocol.

Chapter 4: Disks and Filesystems

Partitions are subdivisions of the whole disk, on Linux, they’re denoted with a number after the whole block device. The kernel presents each partition as a block device, just as it would an entire disk. Partitions are defined on a small area of the disk called a partition table. The next layer after the partition is the filesystem, the database of files and directories that you’re accustomed to interacting with in user space.

Partitioning disk devices

There are many kinds of partition tables. The traditional table is the one found inside the Master Boot Record (MBR). A newer standard is the Globally Unique Identifier Partition Table (GPT). Some linux partition tools are:

Viewing a Partition Table

You can view your system’s partition table with parted -l. An MBR partition can be of type primary, extended, and logical. A primary partition is a normal subdivision of the disk. The basic MBR has a limit of four primary partitions so if you want more than four, you designate one partition as an extended partition.

Changing Partition Tables

Changing the partition table makes it quite difficult to recover any data on partitions that you delete because it changes the initial point of reference for a filesystem, yo need to ensure that no partitions on your target disk are currently in use. Different tools can be used to create the partitions, like parted, gparted, gdisk or fdisk. fdisk and parted modify the partitions entirely in user space.

Disk and Partition Geometry

The disk consists of a spinning platter on a spindle, with a head attached to a moving arm that can sweep across the radius of the disk, even though you can think of a hard disk as a block device with random access to any block, there are serious performance consequences if you aren’t careful about how you lay out data on the disk.

Solid-State Disks (SSDs)

In olid-state disks (SSDs), random access is not a problem because there’s no head to sweep across a platter, but some considerations like partition alignment (data is read in blocks of fixed size, if data is laid in two blocks you need to do two reads even if the amount of data to read is less than the block size).

A filesystem is a form of database; it supplies the structure to transform a simple block device into the sophisticated hierarchy of files and subdirectories that users can understand. Filesystems are also traditionally implemented in the kernel, but there are also file System in User Space (FUSE) and Virtual File Systems (VFS).

These are the most common types of filesystems for data storage. The type names as recognized by Linux are in parentheses next to the filesystem names:

Creating a Filesystem

To create filesystems as with partitioning, you’ll do this in user space because a user-space process can directly access and manipulate a block device. The mkfs utility can create many kinds of filesystems:

mkfs is only a frontend for a series of filesystem creation programs, mkfs.fs, where fs is a filesystem type. So when you run mkfs -t ext4, mkfs in turn runs mkfs.ext4 located at /sbin/mkfs.*

Mounting a Filesystem

The process of attaching a filesystem is called mounting. When the system boots, the kernel reads some configuration data and mounts root (/) based on the configuration data.In order to mount a filesystem, you must know the following:

Device names can change because they depend on the order in which the kernel finds the devices. To solve this problem, you can identify and mount filesystems by their Universally Unique Identifier (UUID), a software standard. To view a list of devices and the corresponding filesystems and UUIDs on your system, use the blkid program. To mount a filesystem by its UUID, use mount UUID= .

Disk Buffering, Caching, and Filesystems

Unix buffers writes to the disk, when you unmount a filesystem with umount, the kernel automatically synchronizes with the disk. You can force the kernel to write the changes in its buffer to the disk by running the sync command.

Filesystem Mount Options

Some important options of the mount command:

Remounting a Filesystem

To reattach a currently mounted filesystem at the same mount point with different mount options do mount -o remount

The /etc/fstab Filesystem Table

Linux systems keep a permanent list of filesystems and options in /etc/fstab. Each line describes a filesystem with the following fields:

Alternatives to /etc/fstab

Alternatives to the /etc/fstab file are /etc/fstab.d directory that contains individual filesystem configuration files and to configure systemd units for the filesystems.

To view the size and utilization of your currently mounted filesystems, use the df command. The output contains:

If your disk fills up and you need to know where the space is allocated, use the du command.

Checking and Repairing Filesystems

If errors exist in a mounted filesystem, data loss and system crashes may result. The tool to check a filesystem is fsck there are different version of fsck for each filesystem type that Linux supports). To run fsck in interactive manual mode (-n for automatic mode), give the device or the mount point (as listed in /etc/fstab) as the argument:

Checking ext3 and ext4 Filesystems

You may wish to mount a broken ext3 or ext4 filesystem in ext2 mode because the kernel will not mount an ext3 or ext4 filesystem with a nonempty journal. To flush the journal in an ext3 or ext4 filesystem to the regular filesystem database, run e2fsck as follows e2fsck –fy /dev/disk_device

The debugfs tool allows you to look through the files on a filesystem and copy them elsewhere.

Most versions of Unix have filesystems that serve as system interfaces. That is, rather than serving only as a means to store data on a device, a filesystem can represent system information such as process IDs and kernel diagnostics. The special filesystem types in common use on Linux include the following:

If you run out of real memory, the Linux virtual memory system can automatically move pieces of memory to and from a disk storage (swapping). The disk area used to store memory pages is called swap space (to view the current swap memory use the free command).

Using a Disk Partition as Swap Space

To use an entire disk partition as swap, make sure the partition is empty and run mkswap dev , where dev is the partition’s device. Last, execute swapon dev to register the space with the kernel.

Using a File as Swap Space

Use these commands to create an empty file, initialize it as swap, and add it to the swap pool:

How Much Swap Do You Need?

Unix conventional wisdom said you should always reserve at least twice as much swap as you have real memory.

Inside a Traditional Filesystem

A traditional Unix filesystem has two primary components: a pool of data blocks where you can store data and a database system that manages the data pool. The database is centered around the inode data structure. An inode is a set of data that describes a particular file, including its type, permissions and where in the data pool the file data resides. inodes are identified by numbers listed in an inode table.

Viewing Inode Details

To view the inode numbers for any directory, use the ls -i command. For more detailed inode information, use the stat command.

Chapter 5: How the Linux Kernel Boots

A simplified view of the boot process looks like this:

There are two ways to view the kernel’s boot and runtime diagnostic messages:

Kernel Initialization and Boot Options

Upon startup, the Linux kernel initializes in this general order:

When running the Linux kernel, the boot loader passes in a set of text-based kernel parameters that tell the kernel how it should start. These parameters are at /proc/cmdline. The most important parameter is the root parameter, which is the location of the root filesystem (without it, the kernel cannot find init and therefore cannot perform the user space start). Upon encountering a parameter that it does not understand, the Linux kernel saves the parameter. The kernel later passes the parameter to init when performing the user space start.

A boot loader starts the kernel and starts it with a set of parameters. The kernel and its parameters are usually somewhere on the root filesystem, and even when the kernel parameters or disk drivers hasn’t been loaded, it is possible to load the kernel because nearly all disk hardware has firmware that allows the BIOS to access attached storage hardware with Linear Block Addressing (LBA).

Boot Loader Tasks

A Linux boot loader’s core functionality includes the ability to do the following:

GRUB (Grand Unified Boot Loader) is a near-universal standard on Linux systems and has a filesystem navigation that allows for much easier kernel image and configuration selection. GRUB doesn’t really use the Linux kernel, it starts it.

Exploring Devices and Partitions with the GRUB Command Line

GRUB has its own device-addressing scheme, named hdx where x is 0,1. GRUB has also a command line (access it by pressing C at the boot menu), where you can do:

The GRUB configuration directory contains the central configuration file (grub.cfg) and numerous loadable modules with a .mod suffix. The directory is usually /boot/grub or /boot/grub2. Use grub-mkconfig command to modify this file.

Reviewing the Grub.cfg

the grub.cfg file consists of GRUB commands, which usually begin with a number of initialization steps followed by a series of menu entries for different kernel and boot configurations. Later in this file you should see the available boot configurations, each beginning with the menuentry command.

Generating a New Configuration File

To make changes to your GRUB configuration, add your new configuration elsewhere, then run grub-mkconfig to generate the new configuration. Every file in /etc/grub.d is a shell script that produces a piece of the grub.cfg file. The grub-mkconfig command itself is a shell script that runs everything in /etc/grub.d.

Installing GRUB on Your System

GRUB comes with a utility called grub-install, which performs most of the work of installing the GRUB files and configuration for you.

Installing GRUB on an External Storage Device

To install GRUB on a storage device outside the current system, you must manually specify the GRUB directory on that device as your current system now sees it: grub-install —boot-directory=

Installing GRUB with UEFI

UEFI installation is supposed to be easier, because you all you need to do is copy the boot loader into place. But you also need to “announce” the boot loader to the firmware with the efibootmgr command: grub-install —efi-directory=efi_dir –-bootloader-id=name

Chainloading Other Operating Systems

UEFI makes it relatively easy to support loading other operating systems because you can install multiple boot loaders in the EFI partition but the older MBR style doesn’t support it.

Boot Loader Details

The Master Boot Record (MBR) includes a small area (441 bytes) that the PC BIOS loads and executes after its Power-On Self-Test (POST), but usually additional space is necessary, resulting in what is sometimes called a multi-stage boot loader (MBR does nothing other than load the rest of the boot loader code).

The Extensible Firmware Interface (EFI) which current standard is Unified EFI (UEFI) consists on a special filesystem called the EFI System Partition (ESP), which contains a directory named efi. Each boot loader has its own identifier and a corresponding subdirectory. A boot loader file has an .efi extension and resides in one of these subdirectories, along with other supporting files.

In summary, GRUB does the following:

Chapter 6: How User Space Starts

User space starts in roughly this order:

Introduction to init

The init program is a user-space program like any other program on the Linux system, and you’ll find it in /sbin along with other system binaries. Its main purpose is to start and stop the essential service processes on the system. There are three major implementations:

System V Runlevels

At any given time on a Linux system, a certain base set of processes (such as crond and udevd) is running. In System V init, this state of the machine is called its runlevel, which is denoted by a number from 0 through 6. You can check your system’s runlevel with the who -r command

Identifying your init

The systemd init handles the regular boot process and aims to incorporate a number of standard Unix services (cron, inetd). It has the ability to defer the start of services and operating system features until they are necessary.

Источник