Sysfs linux ��

Виртуальные файловые системы в Linux: зачем они нужны и как они работают? Часть 1

Всем привет! Мы продолжаем запуски новых потоков по уже полюбившимся вам курсам и сейчас спешим сообщить о том, что у нас стартует новый набор по курсу «Администратор Linux», который запустится в конце апреля. К этому событию и будет приурочена новая публикация. С оригиналом материала можно ознакомиться тут.

Виртуальные файловые системы выполняют роль некой волшебной абстракции, которая позволяет философии Linux говорить, что «всё является файлом».

Что такое файловая система? Опираясь на слова одного из первых контрибьюторов и авторов Linux Робера Лава, «Файловая система – это иерархическое хранилище данных, собранное в соответствии с определенной структурой». Как бы то ни было, это определение в равной мере хорошо подходит для VFAT (Virtual File Allocation Table), Git и Cassandra (база данных NoSQL). Так что именно определяет такое понятие, как «файловая система»?

Основы файловой системы

Ядро Linux имеет определенные требования к сущности, которая может считаться файловой системой. Она должна реализовывать методы open() , read() и write() для постоянных объектов, которые имеют имена. С точки зрения объектно-ориентированного программирования, ядро определяет обобщенную файловую систему (generic filesystem) в качестве абстрактного интерфейса, а эти три большие функции считаются «виртуальными» и не имеют конкретного определения. Соответственно, реализация файловой системы по умолчанию называется виртуальной файловой системой (VFS).

Если мы можем открывать, читать и записывать в сущность, то эта сущность считается файлом, как мы видим из примера в консоли сверху.
Феномен VFS лишь подчеркивает наблюдение, характерное для Unix-подобных систем, которое гласит, что «всё является файлом». Подумайте, насколько странно, что тот маленький пример сверху с /dev/console показывает, как на самом деле работает консоль. На картинке изображена интерактивная Bash сессия. Отправка строки в консоль (virtual console device) отображает ее на виртуальном экране. VFS имеет другие, еще более странные свойства. Например, она дает возможность осуществлять поиск по ним.

Знакомые нам системы, такие как ext4, NFS и /proc имеют три важные функции в структуре данных С, которая называется file_operations. Кроме того, определенные файловые системы расширяют и переопределяют функции VFS привычным объектно-ориентированным способом. Как отмечает Роберт Лав, абстракция VFS позволяет пользователям Linux беспечно копировать файлы в или из сторонних операционных систем или абстрактных сущностей, таких как pipes, не беспокоясь об их внутреннем формате данных. Со стороны пользователя (userspace) с помощью системного вызова процесс может копировать из файла в структуры данных ядра с помощью метода read() одной файловой системы, а затем использовать метод write() другой файловой системы для вывода данных.

Определения функций, которые принадлежат к базовым типам VFS, находятся в файлах fs/*.c исходного кода ядра, в то время как подкаталоги fs/ содержат определенные файловые системы. В ядре также содержатся сущности, такие как cgroups , /dev и tmpfs , которые требуются в процессе загрузки и поэтому определяются в подкаталоге ядра init/ . Заметьте, что cgroups , /dev и tmpfs не вызывают «большую тройку» функций file_operations , а напрямую читают и пишут в память.
На приведенной ниже диаграмме показано, как userspace обращается к различным типам файловых систем, обычно монтируемых в системах Linux. Не показаны такие конструкции как pipes , dmesg и POSIX clocks , которые также реализуют структуру file_operations , доступ к которым проходит через слой VFS.

VFS — это «слой оболочки» между системными вызовами и реализациями определенных file_operations , таких как ext4 и procfs . Функции file_operations могут взаимодействовать либо с драйверами устройств, либо с устройствами доступа к памяти. tmpfs , devtmpfs и cgroups не используют file_operations , а напрямую обращаются к памяти.
Существование VFS обеспечивает возможность переиспользовать код, так как основные методы, связанные с файловыми системами, не должны быть повторно реализованы каждым типом файловой системы. Переиспользование кода – широкоприменяемая практика программных инженеров! Однако, если повторно используемый код содержит серьезные ошибки, от них страдают все реализации, которые наследуют общие методы.

/tmp: Простая подсказка

Простой способ обнаружить, что VFS присутствуют в системе – это ввести mount | grep -v sd | grep -v :/ , что покажет все смонтированные ( mounted ) файловые системы, которые не являются резидентами на диске и не NFS, что справедливо на большинстве компьютеров. Одним из перечисленных маунтов ( mounts ) VFS, несомненно, будет /tmp , верно?

Все знают, что хранение /tmp на физическом носителе – безумие! Источник.

Почему нежелательно хранить /tmp на физическом носителе? Потому что файлы в /tmp являются временными, а устройства хранения медленнее, чем память, где создается tmpfs. Более того, физические носители более подвержены износу при перезаписи, чем память. Наконец, файлы в /tmp могут содержать конфиденциальную информацию, поэтому их исчезновение при каждой перезагрузке является неотъемлемой функцией.

К сожалению, некоторые скрипты инсталляции Linux дистрибутивов создают /tmp на устройстве хранения по умолчанию. Не отчаивайтесь, если это произошло и с вашей системой. Выполните несколько простых инструкций с Arch Wiki, чтобы это исправить, и помните о том, что память выделенная для tmpfs , становится недоступной для других целей. Другими словами, система с гигантской tmpfs и большими файлами в ней может израсходовать всю память и упасть. Другая подсказка: во время редактирования файла /etc/fstab , помните о том, что он должен заканчиваться новой строкой, иначе ваша система не загрузится.

Помимо /tmp , VFS (виртуальные файловые системы), которые наиболее знакомы пользователям Linux – это /proc и /sys . ( /dev располагается в общей памяти и не имеет file_operations ). Почему именно эти два компонента? Давайте разберемся в этом вопросе.

procfs создает снимок мгновенного состояния ядра и процессов, которые он контролирует для userspace . В /proc ядро выводит информацию о том, какими средствами оно располагает, например, прерывания, виртуальная память и планировщик. Кроме того, /proc/sys – это место, где параметры, настраиваемые с помощью команды sysctl , доступны для userspace . Статус и статистика отдельных процессов выводится в каталогах /proc/ .

Здесь /proc/meminfo — это пустой файл, который тем не менее содержит ценную информацию.

Поведение /proc файлов показывает, какими непохожими могут быть дисковые файловые системы VFS. С одной стороны, /proc/meminfo содержат информацию, которую можно посмотреть командой free . С другой же, там пусто! Как так получается? Ситуация напоминает знаменитую статью под названием «Существует ли луна, когда на нее никто не смотрит? Реальность и квантовая теория», написанную профессором физики Корнельского университета Дэвидом Мермином в 1985 году. Дело в том, что ядро собирает статистику памяти, когда происходит запрос к /proc , и на самом деле в файлах /proc ничего нет, когда никто туда не смотрит. Как сказал Мермин, «Фундаментальная квантовая доктрина гласит, что измерение, как правило, не выявляет ранее существовавшего значения измеряемого свойства.» (А над вопросом про луну подумайте в качестве домашнего задания!)
Кажущаяся пустота procfs имеет смысл, поскольку располагающаяся там информация динамична. Немного другая ситуация с sysfs . Давайте сравним, сколько файлов размером не менее одного байта есть в /proc и в /sys .

Procfs имеет один файл, а именно экспортированную конфигурацию ядра, которая является исключением, поскольку ее нужно генерировать только один раз за загрузку. С другой стороны, в /sys лежит множество более объемных файлов, многие из которых занимают целую страницу памяти. Обычно файлы sysfs содержат ровно одно число или строку, в отличие от таблиц информации, получаемой при чтении таких файлов, как /proc/meminfo .

Цель sysfs – предоставить свойства доступные для чтения и записи того, что ядро называет «kobjects» в userspace. Единственная цель kobjects – это подсчет ссылок: когда удаляется последняя ссылка на kobject, система восстановит ресурсы, связанные с ним. Тем не менее, /sys составляет большую часть знаменитого «stable ABI для userspace» ядра, которое никто никогда, ни при каких обстоятельствах не может «сломать». Это не означает, что файлы в sysfs статичны, что противоречило бы подсчету ссылок на нестабильные объекты.
Стабильный двоичный интерфейс приложений ядра (kernel’s stable ABI) ограничивает то, что может появиться в /sys , а не то, что на самом деле присутствует в данный конкретный момент. Листинг разрешений на файлы в sysfs обеспечивает понимание того, как конфигурируемые параметры устройств, модулей, файловых систем и т.д. могут быть настроены или прочитаны. Делаем логический вывод, что procfs также является частью stable ABI ядра, хотя это не указано явно в документации.

Файлы в sysfs описывают одно конкретное свойство для каждой сущности и могут быть читаемыми, перезаписываемыми или и то и другое сразу. «0» в файле говорит о том, что SSD не может быть удален.

Вторую часть перевода начнем с того, как наблюдать за VFS с помощью инструментов eBPF и bcc, а сейчас ждем ваши комментарии и традиционно приглашаем на открытый вебинар, который уже 9 апреля проведет наш преподаватель — Владимир Дроздецкий.

Источник

Sysfs linux ��

sysfs is a ram-based filesystem initially based on ramfs. It provides a means to export kernel data structures, their attributes, and the linkages between them to userspace. sysfs is tied inherently to the kobject infrastructure. Please read Documentation/kobject.txt for more information concerning the kobject interface. Using sysfs

sysfs is always compiled in if CONFIG_SYSFS is defined. You can access it by doing: mount -t sysfs sysfs /sys Directory Creation

For every kobject that is registered with the system, a directory is created for it in sysfs. That directory is created as a subdirectory of the kobject’s parent, expressing internal object hierarchies to userspace. Top-level directories in sysfs represent the common ancestors of object hierarchies; i.e. the subsystems the objects belong to. Sysfs internally stores a pointer to the kobject that implements a directory in the kernfs_node object associated with the directory. In the past this kobject pointer has been used by sysfs to do reference counting directly on the kobject whenever the file is opened or closed. With the current sysfs implementation the kobject reference count is only modified directly by the function sysfs_schedule_callback(). Attributes

Attributes can be exported for kobjects in the form of regular files in the filesystem. Sysfs forwards file I/O operations to methods defined for the attributes, providing a means to read and write kernel attributes. Attributes should be ASCII text files, preferably with only one value per file. It is noted that it may not be efficient to contain only one value per file, so it is socially acceptable to express an array of values of the same type. Mixing types, expressing multiple lines of data, and doing fancy formatting of data is heavily frowned upon. Doing these things may get you publicly humiliated and your code rewritten without notice. An attribute definition is simply: struct attribute < char * name; struct module *owner; umode_t mode; >; int sysfs_create_file(struct kobject * kobj, const struct attribute * attr); void sysfs_remove_file(struct kobject * kobj, const struct attribute * attr); A bare attribute contains no means to read or write the value of the attribute. Subsystems are encouraged to define their own attribute structure and wrapper functions for adding and removing attributes for a specific object type. For example, the driver model defines struct device_attribute like: struct device_attribute < struct attribute attr; ssize_t (*show)(struct device *dev, struct device_attribute *attr, char *buf); ssize_t (*store)(struct device *dev, struct device_attribute *attr, const char *buf, size_t count); >; int device_create_file(struct device *, const struct device_attribute *); void device_remove_file(struct device *, const struct device_attribute *); It also defines this helper for defining device attributes: #define DEVICE_ATTR(_name, _mode, _show, _store) \ struct device_attribute dev_attr_##_name = __ATTR(_name, _mode, _show, _store) For example, declaring static DEVICE_ATTR(foo, S_IWUSR | S_IRUGO, show_foo, store_foo); is equivalent to doing: static struct device_attribute dev_attr_foo = < .attr = < .name = "foo", .mode = S_IWUSR | S_IRUGO, >, .show = show_foo, .store = store_foo, >; Note as stated in include/linux/kernel.h «OTHER_WRITABLE? Generally considered a bad idea.» so trying to set a sysfs file writable for everyone will fail reverting to RO mode for «Others». For the common cases sysfs.h provides convenience macros to make defining attributes easier as well as making code more concise and readable. The above case could be shortened to: static struct device_attribute dev_attr_foo = __ATTR_RW(foo); the list of helpers available to define your wrapper function is: __ATTR_RO(name): assumes default name_show and mode 0444 __ATTR_WO(name): assumes a name_store only and is restricted to mode 0200 that is root write access only. __ATTR_RO_MODE(name, mode): fore more restrictive RO access currently only use case is the EFI System Resource Table (see drivers/firmware/efi/esrt.c) __ATTR_RW(name): assumes default name_show, name_store and setting mode to 0644. __ATTR_NULL: which sets the name to NULL and is used as end of list indicator (see: kernel/workqueue.c) Subsystem-Specific Callbacks

When a subsystem defines a new attribute type, it must implement a set of sysfs operations for forwarding read and write calls to the show and store methods of the attribute owners. struct sysfs_ops < ssize_t (*show)(struct kobject *, struct attribute *, char *); ssize_t (*store)(struct kobject *, struct attribute *, const char *, size_t); >; [ Subsystems should have already defined a struct kobj_type as a descriptor for this type, which is where the sysfs_ops pointer is stored. See the kobject documentation for more information. ] When a file is read or written, sysfs calls the appropriate method for the type. The method then translates the generic struct kobject and struct attribute pointers to the appropriate pointer types, and calls the associated methods. To illustrate: #define to_dev(obj) container_of(obj, struct device, kobj) #define to_dev_attr(_attr) container_of(_attr, struct device_attribute, attr) static ssize_t dev_attr_show(struct kobject *kobj, struct attribute *attr, char *buf) < struct device_attribute *dev_attr = to_dev_attr(attr); struct device *dev = to_dev(kobj); ssize_t ret = -EIO; if (dev_attr->show) ret = dev_attr->show(dev, dev_attr, buf); if (ret >= (ssize_t)PAGE_SIZE) < printk("dev_attr_show: %pS returned bad count\n", dev_attr->show); > return ret; > Reading/Writing Attribute Data

To read or write attributes, show() or store() methods must be specified when declaring the attribute. The method types should be as simple as those defined for device attributes: ssize_t (*show)(struct device *dev, struct device_attribute *attr, char *buf); ssize_t (*store)(struct device *dev, struct device_attribute *attr, const char *buf, size_t count); IOW, they should take only an object, an attribute, and a buffer as parameters. sysfs allocates a buffer of size (PAGE_SIZE) and passes it to the method. Sysfs will call the method exactly once for each read or write. This forces the following behavior on the method implementations: — On read(2), the show() method should fill the entire buffer. Recall that an attribute should only be exporting one value, or an array of similar values, so this shouldn’t be that expensive. This allows userspace to do partial reads and forward seeks arbitrarily over the entire file at will. If userspace seeks back to zero or does a pread(2) with an offset of ‘0’ the show() method will be called again, rearmed, to fill the buffer. — On write(2), sysfs expects the entire buffer to be passed during the first write. Sysfs then passes the entire buffer to the store() method. A terminating null is added after the data on stores. This makes functions like sysfs_streq() safe to use. When writing sysfs files, userspace processes should first read the entire file, modify the values it wishes to change, then write the entire buffer back. Attribute method implementations should operate on an identical buffer when reading and writing values. Other notes: — Writing causes the show() method to be rearmed regardless of current file position. — The buffer will always be PAGE_SIZE bytes in length. On i386, this is 4096. — show() methods should return the number of bytes printed into the buffer. This is the return value of scnprintf(). — show() must not use snprintf() when formatting the value to be returned to user space. If you can guarantee that an overflow will never happen you can use sprintf() otherwise you must use scnprintf(). — store() should return the number of bytes used from the buffer. If the entire buffer has been used, just return the count argument. — show() or store() can always return errors. If a bad value comes through, be sure to return an error. — The object passed to the methods will be pinned in memory via sysfs referencing counting its embedded object. However, the physical entity (e.g. device) the object represents may not be present. Be sure to have a way to check this, if necessary. A very simple (and naive) implementation of a device attribute is: static ssize_t show_name(struct device *dev, struct device_attribute *attr, char *buf) < return scnprintf(buf, PAGE_SIZE, "%s\n", dev->name); > static ssize_t store_name(struct device *dev, struct device_attribute *attr, const char *buf, size_t count) < snprintf(dev->name, sizeof(dev->name), «%.*s», (int)min(count, sizeof(dev->name) — 1), buf); return count; > static DEVICE_ATTR(name, S_IRUGO, show_name, store_name); (Note that the real implementation doesn’t allow userspace to set the name for a device.) Top Level Directory Layout

The sysfs directory arrangement exposes the relationship of kernel data structures. The top level sysfs directory looks like: block/ bus/ class/ dev/ devices/ firmware/ net/ fs/ devices/ contains a filesystem representation of the device tree. It maps directly to the internal kernel device tree, which is a hierarchy of struct device. bus/ contains flat directory layout of the various bus types in the kernel. Each bus’s directory contains two subdirectories: devices/ drivers/ devices/ contains symlinks for each device discovered in the system that point to the device’s directory under root/. drivers/ contains a directory for each device driver that is loaded for devices on that particular bus (this assumes that drivers do not span multiple bus types). fs/ contains a directory for some filesystems. Currently each filesystem wanting to export attributes must create its own hierarchy below fs/ (see ./fuse.txt for an example). dev/ contains two directories char/ and block/. Inside these two directories there are symlinks named : . These symlinks point to the sysfs directory for the given device. /sys/dev provides a quick way to lookup the sysfs interface for a device from the result of a stat(2) operation. More information can driver-model specific features can be found in Documentation/driver-api/driver-model/. TODO: Finish this section. Current Interfaces

The following interface layers currently exist in sysfs: — devices (include/linux/device.h) ———————————- Structure: struct device_attribute < struct attribute attr; ssize_t (*show)(struct device *dev, struct device_attribute *attr, char *buf); ssize_t (*store)(struct device *dev, struct device_attribute *attr, const char *buf, size_t count); >; Declaring: DEVICE_ATTR(_name, _mode, _show, _store); Creation/Removal: int device_create_file(struct device *dev, const struct device_attribute * attr); void device_remove_file(struct device *dev, const struct device_attribute * attr); — bus drivers (include/linux/device.h) ————————————— Structure: struct bus_attribute < struct attribute attr; ssize_t (*show)(struct bus_type *, char * buf); ssize_t (*store)(struct bus_type *, const char * buf, size_t count); >; Declaring: static BUS_ATTR_RW(name); static BUS_ATTR_RO(name); static BUS_ATTR_WO(name); Creation/Removal: int bus_create_file(struct bus_type *, struct bus_attribute *); void bus_remove_file(struct bus_type *, struct bus_attribute *); — device drivers (include/linux/device.h) —————————————— Structure: struct driver_attribute < struct attribute attr; ssize_t (*show)(struct device_driver *, char * buf); ssize_t (*store)(struct device_driver *, const char * buf, size_t count); >; Declaring: DRIVER_ATTR_RO(_name) DRIVER_ATTR_RW(_name) Creation/Removal: int driver_create_file(struct device_driver *, const struct driver_attribute *); void driver_remove_file(struct device_driver *, const struct driver_attribute *); Documentation

The sysfs directory structure and the attributes in each directory define an ABI between the kernel and user space. As for any ABI, it is important that this ABI is stable and properly documented. All new sysfs attributes must be documented in Documentation/ABI. See also Documentation/ABI/README for more information.

Источник

Sysfs linux ��� ���

Виртуальные файловые системы в Linux: зачем они нужны и как они работают? Часть 1

Sysfs linux ��� ���

Sysfs linux ��

Sysfs linux ��