File name too long — как дальше жить.
За всю свою долгую практику я как-то не встречался с такой проблемой, но в последние дни повалило просто валом.
Дело в том, что от пользователей Windows 7 исходит огромное количество файлов с именами, состоящими из полных названий неких документов на русском языке, т.е. два байта на букву. Зачастую такие документы невозможно скопировать на Windows XP или Linux. Нет, имя не сокращается автоматически даже в Windows XP, а просто выдается ошибка — File name too long.
Как выяснилось, даже в новом хваленом btrfs тот же лимит длины имени файла — 255 байт.
Пока обходимся сокращением имени файла вручную, но это же не решение. Как дальше жить? На мой взгляд, нынешний линуксовый лимит неадекватно мал.
100000 лоровцев кукарекают, один чинит, nuff said.
пусть пользователи Windows 7 укорачивают свои файлы, что ещё делать )
Хорошая идея — сделать эту штуку по-нормальному, но уже на Rust’е.
Файловые системы должны быть надёжными.
Использовать XFS, в расширенных атрибутах (man attr) можно хранить любые пары: ключ:значение, где длина ключа до 255, а значения до 64к. Т.е. можно укорачивать имя файла по любому алгоритму, а оригинальное имя хранить в ext.attr.
да нормальный лимит. нефиг называть файлы как попало. мы одно время делали большие проекты, файлов было просто дохрена. и схемы, и документация, и разработки по механике. миллионы файлов. и само дерево их хранения было довольно глубоким и развесистым. так в технической документации все файлы именовались кодом проекта по ГОСТ и буквенно-цифровой аббревиатурой, в которой зашифровано назначение файла. а длинные художественные названия и пояснения были в специальных справочных файлах, в которых кодовые имена сопровождались той самой фигнёй, которую юзер тщится впихнуть в название, да ещё и на русском языке.
думаю, есть утильки, которые тупо урезают имя файла при копировании.
Источник
File Naming Conventions in Linux
A file name, also called a filename, is a string (i.e., a sequence of characters) that is used to identify a file.
A file is a collection of related information that appears to the user as a single, contiguous block of data and that is retained in storage, e.g., a hard disk drive (HDD), floppy disk, optical disk or magnetic tape. Names are given to files on Unix-like operating systems to enable users to easily identify them and to facilitate finding them again in the future.
However, file names are only a convenience for users, and such operating systems identify files by their inodes, which are numbers that are stored on the HDD in inode tables and which exist for all types of files, rather than by their names or locations in directories.
This is somewhat analogous to the domain names that are used on the Internet to identify web sites. The names are only for the convenience of human users of the system, and each site is identified by the network by a set of numbers referred to as an IP address.
File names in Linux can contain any characters other than (1) a forward slash ( / ), which is reserved for use as the name of the root directory (i.e., the directory that contains all other directories and files) and as a directory separator, and (2) the null character (which is used to terminate segments of text). Spaces are permitted, although they are best avoided because they can be incompatible with legacy software in some cases.
Typically, however, file names only use alphanumeric characters (mostly lower case), underscores, hyphens and periods. Other characters, such as dollar signs, percentage signs and brackets, have special meanings to the shell and can be distracting to work with. File names should never begin with a hyphen.
A relatively small number of file names on a system consist only of upper case characters, such as README, INSTALL, NEWS and AUTHORS. They are usually plain text files that come bundled with programs and are for documentation purposes.
File names were limited to 14 bytes (equivalent to 14 characters) in early UNIX systems. However, modern Unix-like systems support long file names, usually up to 255 bytes in length. File names can be as short as a single character.
In some operating systems, such as MS-DOS and the Microsoft Windows systems, file names consist of two parts: a user-designated name and an extension which is determined by the type of file. The two are separated by a period.
Although Unix-like operating systems generally do not require the use of file extensions, they can be convenient and useful. In particular, they can make it easy to identify file types at a glance and to facilitate manipulating groups of files. Files can also have multiple extensions, such as ghex-2.6.0.tar.gz.
File names must be unique within a directory. However, multiple files and directories with the same name can reside in different directories because such files will have different absolute pathnames (i.e., locations relative to the root directory), and thus the system will be able to distinguish them.
In Unix-like operating systems, directories are just a special type of file, and thus their naming conventions are similar to those for ordinary files. The major exception is the root directory, whose name is always a forward slash.
In documentation, it is usually sufficient to refer to files and directories by their names rather than by their absolute pathnames. However, the first tier of directories in the root directory are usually referred to by their absolute pathnames, e.g., /bin, /boot, /etc, /home and /usr.
There are several ways to change the name of a file or directory. One is to use the mv (i.e., move) command. Thus, for example, to change the name of a file named file1 to file2, the following would be used:
When working in a GUI (graphical user interface), a name can be changed by using the right mouse button to click on an icon (i.e., small image) representing a file or a directory and selecting the Rename item in the menu that appears. The cursor is moved to the label for that item and the new name can then be typed in.
On a Unix-like operating system any file or directory can have multiple names because of the operating system’s use of inodes instead of names to identify files and directories. Additional names can be provided by using the ln command to create one or more hard links to a file or directory.
Created July 21, 2005.
Copyright © 2005 The Linux Information Project. All Rights Reserved.
Источник
Find the longest file name
I have to find the symbolic link which contains the longest folder name in a folder full of symbolic links. So far I have this:
I was wondering if there’s any way to save the folder names while searching, something like this pseudo code:
2 Answers 2
To make the awk more readable:
awk is great at dealing with delimited data. Since paths are delimited by / s, we use that as the field separator (with the -F switch), track the longest name we’ve seen with a longest variable, and its length in the maxlength variable. Some care and feeding to make the output sane if no links are found I shall leave as an exercise for the reader.
defines a function that returns the slashes in the target of the symlink, then you can use it as a sorting method for your globs:
Would list the symlink ( @ ), including hidden ones ( D ) with the deepest target ( O+by_link_depth , reverse-sorts by link depth, and [1] selects the first).
If you only care about the max-depth link target and not the symlink that points to it, you can run zstat +link instead of ls -ld on that symlink, or you could instead define a resolve and by_depth function:
Where +resolve translates the symlink to its target for the glob expansion, and O+by_depth reverse-sorts by depth.
With bash (though the code below is in no way bash specific) and GNU utilities (your -printf is already GNU-specific), you could get something approaching with:
Источник
To what extent does Linux support file names longer than 255 bytes?
I asked about Linux’s 255-byte file name limitation yesterday, and the answer was that it is a limitation that cannot/will not be easily changed. But I remembered that most Linux supports NTFS, whose maximum file name length is 255 UTF-16 characters.
So, I created an NTFS partition, and try to name a file to a 160-character Japanese string, whose bytes in UTF-8 is 480. I expected that it would not work but it worked, as below. How come does it work, when the file name was 480 bytes? Is the 255-byte limitation only for certain file systems and Linux itself can handle file names longer than 255 bytes?
The string is the beginning part of a famous old Japanese essay titled «方丈記». Here is the string.
I had used this web application to count the UTF-8 bytes.
5 Answers 5
The answer, as often, is “it depends”.
Looking at the NTFS implementation in particular, it reports a maximum file name length of 255 to statvfs callers, so callers which interpret that as a 255-byte limit might pre-emptively avoid file names which would be valid on NTFS. However, most programs don’t check this (or even NAME_MAX ) ahead of time, and rely on ENAMETOOLONG errors to catch errors. In most cases, the important limit is PATH_MAX , not NAME_MAX ; that’s what’s typically used to allocate buffers when manipulating file names (for programs that don’t allocate path buffers dynamically, as expected by OSes like the Hurd which doesn’t have arbitrary limits).
The NTFS implementation itself doesn’t check file name lengths in bytes, but always as 2-byte characters; file names which can’t be represented in an array of 255 2-byte elements will cause a ENAMETOOLONG error.
Note that NTFS is generally handled by a FUSE driver on Linux. The kernel driver currently only supports UCS-2 characters, but the FUSE driver supports UTF-16 surrogate pairs (with the corresponding reduction in character length).
The limit for the length of a filename is indeed coded inside the filesystem, e.g. ext4 , from https://en.wikipedia.org/wiki/Ext4 :
Max. filename length 255 bytes
Max. filename length 255 bytes
Max. filename length 255 ASCII characters (fewer for multibyte character encodings such as Unicode)
Max. filename length 255 UTF-16 code units
An overview over these limits for a number of file systems can be found at https://en.wikipedia.org/wiki/Comparison_of_file_systems#Limits . There you can also see that ReiserFS has a higher limit (almost 4K) but the kernel itself (inside VFS, the kernel virtual filesystem) has the limit of 255 bytes.
Your text uses 160 UTF-16 characters as used in NTFS:
This shows 0x140 = 320 bytes (plus 2 bytes prepended byte order mark (BOM) if used). In other words, 160 UTF-16 characters and therefore below the 255 UTF-16 character limit in NTFS but more than 255 bytes.
(ignoring the newline character here)
So, here’s what I’ve found out.
Coreutils don’t particularly care about filename length and simply work with user input regardless of its length, i.e. there are zero checks.
I.e. this works (filename length in bytes 462!):
Even this works
However once you try to copy the said file to any of your classic Linux filesystems, the operation will fail:
I.e. cp has actually attempted to create this file in /tmp but /tmp doesn’t allow filenames longer than 255 bytes.
Also I’ve managed to open this file in mousepad (a GTK application), edit and save it — it all worked which means 255 bytes restriction applies only to certain Linux filesystems.
This doesn’t mean everything will work. For instance my favorite console file manager, Midnight Commander, a clone of Norton Commander — cannot list (shows file size as 0), open, or do anything with this file:
There was/is some limit, for example readdir_r() can’t read file names longer than 255 bytes. However Linux does aware of that and modern APIs can read long file names without problem
There’s this line in ReiserFS wiki
Max. filename length: 4032 bytes, limited to 255 by Linux VFS
so there may be some real limits in VFS although I don’t know enough about Linux VFS to tell. The VFS functions all work on struct dentry which stores names in the struct qstr d_name;
The struct qstr stores hash, length and pointer to the name so I don’t think there are any physical limits unless the VFS functions explicitly truncate the name on creating/opening. I didn’t check the implementation but I think long names should work fine
Update:
The length check is done in linux/fs/libfs.c and ENAMETOOLONG will be returned if the name is too long
The limit is defined in linux/limits.h
But I have no idea how long file names can be opened without that error
However there are a few system calls that do have limits. struct dirent has the following members
Since d_name is a fixed array, many functions like readdir_r() won’t ever be able to return names longer than 255 bytes. For example
On some systems, readdir_r() can’t read directory entries with very long names. When the glibc implementation encounters such a name, readdir_r() fails with the error ENAMETOOLONG after the final directory entry has been read. On some other systems, readdir_r() may return a success status, but the returned d_name field may not be null terminated or may be truncated.
readdir() OTOH allocates memory for struct dirent itself, so the name can actually be longer than 255 bytes and you must not use sizeof(d_name) and sizeof(struct dirent) to get the name and struct lengths
returns the value 255 for most filesystems, on some filesystems (e.g., CIFS, Windows SMB servers), the null-terminated filename that is (correctly) returned in d_name can actually exceed this size. In such cases, the d_reclen field will contain a value that exceeds the size of the glibc dirent structure shown above.
Some other functions like getdents() use struct linux_dirent and struct linux_dirent64 which doesn’t suffer from the fixed length issue
strace ls shows that ls uses getdents() to list files so it can handle file names with arbitrary length
Источник