- What Is Windows File System? [Help]
- Quick Navigation :
- About Windows File System
- Types of Windows File System
- FAT32 in Windows
- NTFS in Windows
- exFAT in Windows
- Comparisons among the Three Types of Windows File System
- Compatibility
- Security
- Supported Volume Size
- Supported File Size
- File System Conversion
- What Exactly Is a File System?
- Definition of file system, what they’re for, and common ones used today
- Windows File Systems
- More About File Systems
- What is the Windows 10 File System?
- Why Doesn’t Windows 10 Use ReFS?
- What Does the Windows 10 File System Mean for Data Recovery?
- What Is a File System? Types of Computer File Systems and How they Work – Explained with Examples
- Reza Lavarian
- What is a file system?
- Why do we need a file system in the first place, you may ask?
- Everything begins with partitioning
- Partitioning schemes, system firmwares, and booting
- How it started, how it’s going
- Architecture of file systems
- What does it mean to mount a file system?
- File Metadata
- Space Management
- Size vs size on disk
- What is disk fragmentation?
- Should We Care About Fragmentation these days?
- Directories
- Filename Rules
- File Size Rules
- File Explorers
- File access management
- Maintaining data integrity
- Database File Systems
- Wrapping Up
- Reza Lavarian
What Is Windows File System? [Help]
Quick Navigation :
About Windows File System
What is file system? Have you ever paid attention to it? This article aims to introduce Windows file system to you.
In computing, file system controls how data is stored and retrieved. In other words, it is the method and data structure that an operating system uses to keep track of files on a disk or partition.
It separates the data we put in computer into pieces and gives each piece a name, so the data is easily isolated and identified.
Without file system, information saved in a storage media would be one large body of data with no way to tell where the information begins and ends.
Types of Windows File System
Knowing what is file system, let’s learn about the types of Windows file system.
There are five types of Windows file system, such as FAT12, FAT16, FAT32, NTFS and exFAT. Most of us like to choose the latter three, and I would like to introduce them respectively for you.
FAT32 in Windows
In order to overcome the limited volume size of FAT16 (its supported maximum volume size is 2GB) Microsoft designed a new version of the file system FAT32, which then becomes the most frequently used version of the FAT (File Allocation Table) file system.
NTFS in Windows
NTFS is the newer drive format. Its full name is New Technology File System. Starting with Windows NT 3.1, it is the default file system of the Windows NT family.
Microsoft has released five versions of NTFS, namely v1.0, v1.1, v1.2, v3.0, and v3.1.
exFAT in Windows
exFAT (Extended File Allocation Table) was designed by Microsoft back in 2006 and was a part of the company’s Windows CE 6.0 operating system.
This file system was created to be used on flash drives like USB memory sticks and SD cards, which gives a hint for its precursors: FAT32 and FAT16.
Comparisons among the Three Types of Windows File System
Everything comes in advantages and shortcomings. Comparisons among the three types of Windows File System will be showed in following content to help you make a choice about selecting one type of file system.
Compatibility
The three types can work in all versions of Windows.
For FAT32, it also works in game consoles and particularly anything with a USB port; for exFAT, it requires additional software on Linux; for NTFS, it is read only by default with Mac, and may be read only by default with some Linux distributions.
With respect to the ideal use, FAT32 is used on removable drives like USB and Storage Card; exFAT is used for USB flash drives and other external drivers, especially if you need files of more than 4 GB in size; NTFS can be used for servers.
Security
The files belonging to FAT32 and NTFS can be encrypted, but the flies belong to the latter can be compressed.
The encryption and compression in Windows are very useful. If other users do not use your user name to login Windows system, they will fail to open the encrypted and compressed files that created with your user name.
In other word, after some files are encrypted, such files only can be opened when people use your account to login Windows system.
Supported Volume Size
For FAT32, the partition size is no larger than 2TB, which means you cannot format a hard drive larger than 2TB as a single FAT32 partition. NTFS allows you use 64KB clusters to achieve a 256TB volume. In theory, you can achieve a 16EB volume of exFAT.
Supported File Size
For FAT32, it fails to support the single files whose size is over 4GB, while NTFS file system can support the size of single file more than 4GB, and for exFAT, the maximum size of single file, in theory, is 16EB.
In conclusion, compared with NTFS and exFAT, FAT32 comes in higher compatibility in old operating systems and removable storage devices, whereas its features limit in single file size and partition size.
Compared with FAT32 and exFAT, NTFS surpasses in security. And exFAT features larger volume volume size and single file size.
File System Conversion
Maybe you already have a hard drive featuring FAT32 or NTFS file system, and you want to make a conversion. In this situation, you can download MiniTool Partition Wizard to help you complete this conversion.
What Exactly Is a File System?
Definition of file system, what they’re for, and common ones used today
Computers use particular kinds of file systems to store and organize data on media, such as a hard drive, the CDs, DVDs, and BDs in an optical drive or on a flash drive.
A file system can be thought of as an index or database containing the physical location of every piece of data on the hard drive or another storage device. The data is usually organized in folders called directories, which can contain other folders and files.
Any place that a computer or other electronic device stores data employs some type of file system. This includes your Windows computer, your Mac, your smartphone, your bank’s ATM—even the computer in your car!
Windows File Systems
The Microsoft Windows operating systems have always supported various versions of the FAT file system. FAT stands for File Allocation Table, a term that describes what it does: maintains a table of each file’s space allocation.
In addition to FAT, all Windows operating systems since Windows NT support a newer file system called NTFS—New Technology File System. For Windows NT, the NT stood for new technology.
All modern versions of Windows also support exFAT, which is designed for flash drives.
ReFS (Resilient File System) is a newer file system for Windows 10 and Windows 8 that includes features not available with NTFS, but it’s currently limited in several ways. You can see which versions of Windows support each version of ReFS in this table.
A file system is set up on a drive during a format. See How to Format a Hard Drive for more information.
More About File Systems
Files on a storage device are kept in sectors. Sectors marked as unused can store data, typically in groups of sectors called blocks. It’s the file system that identifies the size and position of the files as well as which sectors are ready to be used.
Over time, because of the way the file system stores data, writing to and deleting from a storage device causes fragmentation because of the gaps that inevitably occur between different parts of a file. A free defrag utility can help fix that.
Without a structure for organizing files, it not only would be next to impossible to remove installed programs and retrieve specific files, but no two files could exist with the same name because everything might be in the same folder (which is one reason folders are so useful).
What’s meant by files with the same name is like an image, for example. The file IMG123.jpg can exist in hundreds of folders because each folder is used to separate the file, so there isn’t a conflict. However, files can’t bear the same name if they’re in the same directory.
A file system doesn’t just store the files but also information about them, like the sector block size, fragment information, file size, attributes, file name, file location, and directory hierarchy.
Some operating systems other than Windows also take advantage of FAT and NTFS but many different kinds of file systems dot the operating-system horizon, like HFS+ used in Apple product like iOS and macOS. Wikipedia has a comprehensive list of file systems if you’re more interested in the topic.
Sometimes, the term «file system» is used in the context of partitions. For example, saying «there are two file systems on my hard drive» doesn’t mean that the drive is split between NTFS and FAT, but that there are two separate partitions that use the same physical disk.
Most applications you come into contact with require a file system in order to work, so every partition should have one. Also, programs are file system-dependant, meaning you can’t use a program on Windows if it was built for use in macOS.
What is the Windows 10 File System?
We recently received this question from a customer:
“What file system does Windows 10 use? I’ve heard that Microsoft is upgrading to ReFS, and I’m worried about making a switch.”
This is a good question with an easy answer: as is the case with most other consumer versions of Windows, the Windows 10 file system is NTFS (New Technology File System).
This isn’t much of a surprise. NTFS has been the standard file system for Windows computers for years, and it’s been around since the release of Windows NT 3.1 way back in 1993. It’s currently the most common file system in the world.
Why Doesn’t Windows 10 Use ReFS?
If you’ve been following the development of Microsoft’s next-generation file system, you might wonder why the Windows 10 file system isn’t ReFS by default, and when the company plans to make the switch.
ReFS (Resilient File System) improves on NTFS significantly, as it has features like data scrubbing (an error correction technique that prevents data degradation) and improved structural reliability (through B+ trees for on-disk structures).
ReFS lives up to its name, especially for larger Microsoft servers, and it’s generally considered to be a major improvement in terms of reliability. What’s more, it’s been available for a while; ReFS was launched in 2012 with Windows Server 2012.
So, why isn’t it a fundamental part of Microsoft’s flagship OS?
The short answer is that it isn’t ready for consumers. ReFS is designed to run on Storage Spaces, virtual drives in Windows Server, and while it’s stable for this purpose, it’s not nearly ready to replace NTFS as a standard file system for everyday computing. Microsoft will continue to update ReFS, and we should see it introduced to the masses within the next few years (most likely with a host of new features).
However, there’s no reason to complain — NTFS is perfectly fine for consumer use, and it wouldn’t have been a great idea to upgrade the file system just for the sake of making a change.
What Does the Windows 10 File System Mean for Data Recovery?
To recover data, engineers will often need to use appropriate software tools designed for the file system of the target data.
We develop most of our own utilities in our laboratories, and we can recover data from virtually any file system including NTFS and ReFS. Datarecovery.com has recovered thousands of servers and personal computers, and we have already ran a few simulations to make sure that we’re equipped for Windows 10 data recovery.
Since Windows 10 is a new operating system, however, we do not recommend running any file repair software if you experience issues. If you try to recover data using software designed for Windows 8 or older versions of Windows, you could overwrite files permanently.
Call us at 1-800-237-4200 to set up a free evaluation for Windows 10 data recovery or to speak with one of our data experts.
What Is a File System? Types of Computer File Systems and How they Work – Explained with Examples
Reza Lavarian
It’s a bit tricky to explain what exactly a file system is in just one sentence.
That’s why I decided to write an article about it. This post is meant to be a high-level overview of file systems, but I’ll sneak into the lower-level concepts as well. As long as it doesn’t get boring. 🙂
What is a file system?
Let’s start with a simple definition:
A file system defines how files are named, stored, and retrieved from a storage device.
When people talk about file systems, they might refer to different aspects of a file system depending on the context — that’s where things start to seem knotty.
And you might end up asking yourself, WHAT IS A FILE SYSTEM ANYWAY? 🤯
In this guide, I’ll help you understand file systems and any conversation about file systems. I’ll also cover partitioning and booting as well, to help you understand the concepts surrounding file systems.
To keep this guide manageable, I’ll concentrate on Unix-like environments when explaining the lower-level structures or console commands. However, the concepts remain relevant to other environments and file systems.
Why do we need a file system in the first place, you may ask?
Well, without a file system, the storage device would contain a big hunk of data stored back to back, and you wouldn’t be able to tell them apart.
The term file system takes its name from the old paper-based data management systems, where we kept documents as files, and put them into directories.
Imagine a room with piles of papers scattered all over the place.
A storage device without a file system would be in the same situation — and it would be a useless electronic device.
However, a file system changes everything:
A file system isn’t just a bookkeeping feature, though.
Space management, metadata, data encryption, file access control, and data integrity are the responsibilities of file system too.
Everything begins with partitioning
Storage devices must be partitioned and formatted before the first use.
But what is partitioning?
Partitioning is splitting a storage device into several logical regions, so they can be managed separately as if they are separate storage devices.
Partitioning is done by a disk management tool provided by operating systems, or a as text-based CLI tool provided by the system’s firmware.
A storage device should have at least one partition, or more, if needed.
For example, a basic Linux installation has three partitions: one partition dedicated to the operating system, one partition for user files, and a swap partition.
Windows and Mac OS also have similar layout, although they don’t use a dedicated swap partition. Instead, they manage swapping within the partition the operating system is installed on.
So why should we split the storage devices into multiple partitions?
The reason is that we don’t want to manage and use the whole storage space as a single unit, and for a single purpose.
It’s just like how we partition our workspace, to separate (and isolate) meeting rooms, conference rooms, and teams.
On a computer with multiple partitions, you can install multiple operating systems and every time choose a different partition to boot up your system with.
The recovery and diagnostic utilities reside in dedicated partitions too.
For instance to boot up a MacBook in recovery mode, you need hold Command + R as soon as you restart (or turn on) your MacBook.
By doing so, you’re instructing the system to boot up with a partition that contains the recovery program.
Partitioning isn’t just a way of installing multiple operating systems and tools, though. It also allows us to keep critical system files apart from ordinary files.
So no matter how many heavy games you install on your computer, it won’t have any affect on the operating system’s performance — since they reside in different partitions.
Back to the office example, having a call center and a tech team in a common area would have a negative effect on both team’s productivity, because each team has their own requirements to be efficient.
For instance, the tech team would appreciate a quieter area.
Some operating systems, like Windows, assign a drive letter (A, B, C, or D) to the partitions. For instance the primary partition on Windows (on which Windows is installed) is referred to as C:, or drive C.
In Unix-like operating systems, however, partitions appear as normal directories under the root directory — we’ll cover this later.
Guess what? W’re taking a detour here ↩️
In the next section, we’ll dive deeper into partitioning, and get to know two concepts that will change your perspective on file systems: system firmwares and booting.
Partitioning schemes, system firmwares, and booting
When partitioning a storage device, we have two partitioning methods to choose from:
- Master boot record (MBR) Scheme
- GUID Partition Table (GPT) Scheme
Regardless of what partitioning scheme you choose, the first few blocks on the storage device will always contain critical data about your partitions.
The system’s firmware uses these data structures to boot up the operating system.
Wait, what is the system firmware? you may ask.
Here’s an explanation:
A firmware is a low-level software embedded into electronic devices to operate the device, or bootstrap another program to operate the device.
Firmware exists in computers, peripherals (keyboards, mouse, and printers), or even electronic home appliances.
In computers, the firmware provides a standard environment for a complex software like an operating system to boot up and work with hardware components.
However, on simpler systems like a printer, the firmware is the main operating system of the device. The printer menu is the interface of its firmware.
Computer firmwares are implemented based on two specifications:
- Basic Input/Output (BIOS)
- Unified Extensible Firmware Interface (UEFI)
Firmwares — BIOS-based or UEFI-based — reside on a non-volatile memory, like a flash ROM attached to the motherboard.
BIOS By Thomas Bresson, Licensed under CC BY 2.0
When you power on your computer, the firmware is the first program to run.
The mission of the firmware (among other things) is to boot up the computer, run the operating system, and pass it the control of the whole system.
A firmware also runs pre-OS environments (with network support), like recovery or diagnostic tools, or even a special shell to run text-based commands.
The first few screens you see before your operating system’s logo appears are the output of your computer’s firmware, verifying the health of hardware components and the memory.
The initial check is confirmed with a beep, indicating everything is good to go.
MBR partitioning scheme is a part of the BIOS specifications, and used by BIOS-based firmwares.
On MBR-partitioned disks, the first sector on the storage device contains essential data to boot up the system.
This sector is called MBR.
MBR contains the following information:
- The boot loader, which is a simple program (in machine code) to initiate the first stage of the booting process
- A partition table, which contains information about your partitions.
BIOS-based firmwares with MBR-partitioned disks boot the system differently than UEFI-based firmwares.
Here’s how it works:
Once the system is powered on, the BIOS firmware starts and loads the content of MBR into the memory, and runs the boot loader inside it.
Having the boot loader and the partition table in a predefined location like MBR enables BIOS to boot up the system without having to deal with any file.
The boot loader code in the MBR takes between 434 bytes to 446 bytes of the MBR (out of 512b). 64 bytes is also allocated to the partition table for a maximum of four partitions.
446 bytes isn’t big enough to accommodate too much code, though. That said, sophisticated boot loaders like GRUB 2 on Linux split their functionality into pieces, or stages.
The smallest piece, which is known as the first-stage boot loader, sits within the MBR.
The first-stage boot loader initiates the next stages of the booting process.
Immediately after the MBR, and before the first partition, there’s a small space, around 1MB, called the MBR gap. It can be used to place another piece of the boot loader, if needed.
A boot loader, such as GRUB 2, uses the MBR gap to store another stage of its functionality. GRUB calls this the stage 1.5 boot loader, which contains a file system driver.
The stage 1.5 enables the next stages of GRUB to work with files, rather than loading raw data from the storage device (like the first-stage boot loader).
The second stage boot loader, which is now file-system-aware, can load the operating system’s boot loader file to boot up the operating system.
This is when the operating system’s logo fades in.
Here’s the layout of an MBR-partition storage device:
An if we magnify the MBR, its content would look like this:
Although MBR is very simple and widely supported, it has some limitations.
MBR’s data structure limits the number of partitions to only four primary partitions.
A common workaround is to make an extended partition beside the primary partitions, as long as the total number of partitions won’t exceed four.
An extended partition can be split into multiple logical partitions.
When making a partition you can choose between primary and extended.
After this is solved, we’ll encounter the second limitation.
Each partition can be maximum of 2TiB.
And wait, there’s more!
The content of MBR sector has no back up 😱, meaning if MBR gets corrupted due to an unexpected reason, we’ll have a useless storage device.
The GPT partitioning scheme is more sophisticated than MBR, and doesn’t have the limitations MBR has.
For instance, you can have as many partitions as your operating systems allows.
And every partition can be the size of the biggest storage device available in the market — actually a lot more.
GPT is gradually replacing MBR, although MBR is still widely supported across both old PC’s and new.
As mentioned earlier, GPT is a part of the UEFI specification, which is replacing the good old BIOS.
That means that UEFI-based firmwares use a GPT-partitioned storage device to perform the booting.
Many hardware and operating systems now support UEFI and use the GPT scheme to partition storage devices.
In the GPT partitioning scheme, the first sector of the storage device is reserved for compatibility reasons with BIOS-based systems. This is because some systems might still use a BIOS-based firmware but have a GPT-partitioned storage device.
This sector is called Protective MBR. (This is where the first-stage boot loader would reside in an MBR-partitioned disk)
After this first sector, the GPT data structures are stored, including the GPT header and the partition entries.
As a backup, the GPT entries and the GPT header are also stored at the end of the storage device, so it can be recovered if the main copy gets corrupted. This backup is called Secondary GPT.
The layout of a GPT-partitioned storage device looks like this:
GUID Partition Table Scheme By Kbolino, Licensed under CC BY-SA 2.5
In GPT, all the booting services (boot loaders, boot managers, pre-os environments, and shells) live in a special partition called EFI System Partition (ESP), which UEFI firmware can use.
ESP even has it own file system, which is a specific version of FAT. On Linux, ESP resides under the /sys/firmware/efi path.
If this path cannot be found on your system, then you firmware is probably a BIOS-based firmware.
To check you can try to change the directory to the ESP mount point, like so:
UEFI-based firmware assumes that the storage device is partitioned with GPT, and looks up the ESP in the GPT partition table.
Once the EFI partition is found, it looks for the configured boot loader, which is normally a file ending with .efi .
UEFI-based firmware gets the booting configuration from NVRAM (which is non-volatile RAM). NVRAM contains the booting settings as well as paths to the operating system boot loaders.
UEFI firmwares are also capable of performing BIOS-boot (to boot the system from an MBR disk) if configured accordingly.
You can use the parted command on Linux to see what partitioning scheme is used for a storage device.
And the output would be something like this:
Based on the above output, the storage device’s ID is /dev/vda with the capacity of 172GB.
The storage device is partitioned based on GPT, and has three partitions. The second and third partitions are formatted based on the FAT32 and EXT4 file systems respectively.
Having a BIOS GRUB partition implies the firmware is a BIOS-based firmware.
Let’s confirm that with the dmidecode command like so:
And the output would be:
When partitioning is done, the partitions should be formatted.
Most operating systems allow you to format a partition based on a set of file systems.
For instance, if you are formatting a partition on Windows, you can choose between FAT32, NTFS, and exFAT file systems.
Formatting involves the creation of various data structures and metadata used to manage files within a partition.
These data structures are one aspect of a file system.
Let’s take the NTFS file system as an example.
When you format a partition to NTFS, the formatting process places the key NTFS data structures, as well as Master file table (MFT), on the partition.
Alright, enough about partitioning and booting. Let’s get back to file systems.
How it started, how it’s going
A file system is a set of data structures, interfaces, abstractions and APIs that work together to manage any type of file on any type of storage device, in a consistent manner.
Each operating system uses a particular file system to manage the files.
In the early days, Microsoft used FAT (FAT12, FAT16, and FAT32) in the MS-DOS and Windows 9x family.
Starting from Windows NT 3.1, Microsoft developed New Technology File System (NTFS), which had many advantages over FAT32, such as supporting bigger files, longer filenames, data encryption, access management, journaling, and a lot more.
NTFS has been the default file system of the Window NT family (2000, XP, Vista, 7, 10, etc) since then.
NTFS isn’t suitable for non-Windows environments, though.
For instance, you can only read the content of an NTFS-formatted storage device (like flash memory) on a Mac OS, but you won’t be able to write anything to it — unless you install a NTFS driver with write support.
Extended File Allocation Table (exFAT) is a lighter version of NTFS created by Microsoft in 2006.
exFAT was designed for high-capacity removable devices, such as external hard disks, USB drives, and memory cards.
exFAT is the default file system used by SDXC cards.
Unlike NTFS, exFAT has read and write support on Non-Windows environments as well , including Mac OS — making it the best cross-platform file system for high-capacity removable storage devices.
So basically, if you have a removable disk you want to use on Windows, Mac, and Linux, you need to format it to exFAT.
Apple has also developed and used various file systems over the years, including
Hierarchical File System (HFS), HFS+, and recently Apple File System (APFS).
Just like NTFS, APFS is a journaling file system, and has been in use since the launch of OS X High Sierra in 2017.
The Extended File System (ext) family of file systems was specifically created for the Linux kernel — the core of the Linux operating system.
The first version of ext was released in 1991 but soon after it was replaced by the second extended file system (ext2) in 1993.
In the 2000s the third extended filesystem (ext3) and fourth extended filesystem (ext4) were developed for Linux with journaling capability.
ext4 is now the default file system in many distributions of Linux, including Debian and Ubuntu.
You can use the findmnt command on Linux to list your ext4-formatted partitions:
The output would be something like:
Architecture of file systems
The anatomy of a file system within an operating system consists of three layers:
- Physical file system
- Virtual file system
- Logical file system.
These layers can be implemented as independent or tightly coupled abstractions.
When people talk about file systems, they refer to one of these layers.
Although these layers are different across operating systems, the concept is pretty much the same.
The physical layer is the concrete implementation of a file system. It is responsible for data storage and retrieval, as well as space management on the storage device.(or more precisely partitions)
The physical file system interacts with the actual storage hardware, via device drivers.
The next layer is the virtual file system, or VFS.
The virtual file system provides a consistent view of various file systems mounted on the same operating system.
So does this mean an operating system can use multiple file systems at the same time?
The answer is yes!
It’s common for a removable storage medium to have a different file system than that of the computer.
For instance, on Windows (which uses NTFS as the main file system), a flash memory might have been formatted to exFAT or FAT32.
That said, the operating system should provide a unified interface — a consistent view — between programs (file explorers and other apps that work with files) and the different mounted file systems (such as NTFS, APFS, EXT4, FAT32, exFAT, and UDF).
For instance, when you open up your file explorer program, you can copy an image from an EXT4 file system, and paste it over to your exFAT-formatted flash memory — without having to know that files are managed differently under the hood.
This convenient layer between the user (you) and the underlying file systems is provided by the VFS.
A VFS defines a contract that all physical file systems must implement to be supported by the operating system.
However, this compliance isn’t built into the file system core, meaning the source code of a file system doesn’t include support for every operating system.
Instead it uses a file system driver to adhere to the VFS rules.
A driver is a special program that enables a software to communicate with another software or hardware.
User programs don’t directly interact with the VFS, though.
Instead, they use a unified API, which sits between programs and the VFS.
Yes, we’re talking about the logical file system.
The logical file system is the user facing part of a file system, which provides an API to enable user programs to perform various file operations, such as OPEN , READ , and WRITE , without having to deal with any storage hardware.
On the other hand, VFS provides a bridge between the logical layer (which programs interact with) and a set of physical file systems.
A high-level architecture of the file system layers
What does it mean to mount a file system?
On Unix-like systems, the VFS assigns a device ID (for example dev/disk1s1 ) to each partition or removable storage device.
Then, it creates a virtual directory tree and puts the content of each device under that directory tree as separate directories.
The act of assigning a directory to a storage device (under the root directory tree) is called mounting, and the assigned directory is called a mount point.
That said, on a Unix-like operating system, all partitions and removable storage devices appear as if they are directories under the root directory.
For instance, on Linux, the mounting point for a removable device (such as a memory card), is /media (relative to the root directory) by default — unless configured otherwise.
That said, once a flash memory is attached to the system, and consequently, auto mounted at the default mounting point ( /media in this case), its content would be available under /media directory .
However, there are times you need to mount a file system manually.
On Linux, it’s done like so:
In the above command, the first parameter is the device ID ( /dev/disk1s1 ), and the second parameter ( /media/usb ) is the mount point.
Please note that the mount point should already exist as a directory.
If it doesn’t, it has to be created first:
File Metadata
File metadata is a data structure that contains data about a file, such as:
- File size
- Timestamps, like creation date, last accessed date, and modification date
- The file’s owner
- The file’s mode (who can do what with the file)
- What blocks on the partition are allocated to the file
- and a lot more
Metadata isn’t stored with the file content, though. Instead, it’s stored in a different place on the disk — but associated with the file.
In Unix-like systems, the metadata is in the form of special data structures, called inode.
Inodes are identified by a unique number called the inode number.
Inodes are associated with files in a data structure called inode tables.
Each file on the storage device has an inode, which contains information about the file, including the address of the blocks allocated to the file.
In an ext4 inode, the address of the allocated blocks is stored as a set of data structures called extents (within the inode).
Each extent contains the address of the first data block allocated to the file, and the number of the next continuous blocks that the file has occupied.
If the file is fragmented, each fragment will have its own extent.
This is different than ext3’s pointer system, which points to individual data blocks via indirect pointers.
Using an extent data structure enables the file system to point to large files without taking up too much space.
Whenever a file is requested, its name is first resolved to an inode number.
Having the inode number, the file system fetches the respective inode from the storage device.
Once the inode is fetched, the file system starts to compose the file from the data blocks stored in the inode.
You can use the df command with the -i parameter on Linux to see the inodes (total, used, and free) in your partitions:
The output would look like this:
As you can see, the partition /dev/vda1 has a total number of 6451200, of which 3% have been used (175101 inodes).
Additionally, to see the inodes associated with files in a directory, you can use the ls command with -il parameters.
And the output would be:
The first column is the inode number associated with each file.
The number of inodes on a partition is decided when the partition is formatted. So as long as there’s free space and there are unused inodes, files can be stored on the storage device.
It’s unlikely that a personal Linux OS would run out of inodes. However, enterprise services that deal with a large number of files (like mail servers) have to manage their inode quota smartly.
On NTFS, the metadata is stored differently, though.
NTFS keeps file information in a special data structure called the Master File Table (MFT).
Every file has at least one entry in MFT, which contains everything about the respective file, including its location on the storage device — similar to inodes.
On most operating systems, general file metadata can be accessed from the graphical user interface as well.
For instance, when you right-click on a file on Mac OS, and select Get Info (Properties in Windows), a window appears with information about the file. This information is fetched from the respective file’s metadata.
Space Management
Storage devices are divided into fixed-sized blocks called sectors.
A sector is the minimum storage unit on a storage device, and is between 512 bytes and 4096 bytes (Advanced Format).
However, file systems use a high-level concept as the storage unit, called blocks.
Blocks are an abstraction over physical sectors and consist of multiple sectors.
Depending on the size of a file, the file system allocates one or more blocks to each file.
Speaking of space management, the file system is aware of every used and unused block on the partitions, so it’ll be able to allocate space to new files or fetch the existing ones when requested.
The most basic storage unit in ext4-formatted partitions is the block.
However, the contiguous blocks are grouped into block groups for easier management.
The layout of a block group within an ext4 partition
Each block group has its own data structures and data blocks.
Here are the data structures a block group can contain:
- Super Block: a metadata repository, which contains meta data about the entire file system, such as total number of blocks in the file system, total blocks in block groups, inodes, and more. Not all block groups contain the super block, though. A certain number of block groups store a copy of the super as a backup.
- Group Descriptors: Group descriptors also contain bookkeeping information for each block group
- Inode Bitmap: Each block group has its own inode quota for storing files. A block bitmap is a data structure used to identify used and unused inodes within the block group. 1 denotes used and 0 denotes unused inode objects.
- Block Bitmap: a data structure used to identify used and unused data blocks within the block group. 1 denotes used and 0 denotes unused data blocks
- Inode Table: a data structure which defines the relation of files and their inodes. The number of inodes stored in this area is related to the block size used by the file system.
- Data Blocks: This is the zone within the block group where file contents are stored.
Ext4 file systems even take one step further (comparing to ext3), and organise block groups into a bigger group called flex block groups.
Each flex box group contains a number block groups.
The data structures of each block group, including the block bitmap, inode bitmap, and inode table, are concatenated and stored in the first block group within the respective flex block group.
Having all the data structures concatenated in one block group (the first one) frees up more contiguous data blocks on other block groups within each flex block group.
The layout of the first block group looks like this:
Layout of the first block in a ext4 flex block group
When a file is being written to a disk it is written to a one or more blocks within a certain block group.
Managing files in block group level significantly improves the performance of the file system.
Size vs size on disk
Have you ever noticed that your file explorer displays two different sizes for each file: size, and size on disk.
Size and Size on disk
Why are size and size on disk slightly different? You may ask.
Here’s an explanation:
We already know that, depending on the file size, one or more blocks are allocated to a file.
One block is the minimum space that can be allocated to a file. This means the remaining space of a partially-filled block cannot be used by another file.
Since the size of the file isn’t an integer multiple of blocks, the last block might be partially used, and the remaining space would remain unused — or would be filled with zeros.
So «size» is basically the actual file size, while “size on disk” is the space it has occupied, even though it’s not using it all.
You can use du command on Linux to see it for yourself.
The output would be something like this:
And to check the size on disk:
Which will result in:
Based on the output, the allocated block is about 4kb, while the actual file size is 623 bytes.
What is disk fragmentation?
Over time, new files are written to the disk, existing files increase in size, are shrunk, or are deleted.
These frequent changes in the storage medium leave many small gaps (empty spaces) between files.
File Fragmentation occurs when a file is stored as fragments on the storage device because the file system cannot find enough contiguous blocks to store the whole file in a row.
An example of a fragmented and non-fragmented file
Let’s make it more clear with an example.
Imagine you have a Word document named myfile.docx .
myfile.docx is initially stored in a few contiguous blocks on the disk, let’s say LBA250 , LBA251 , and LBA252 — by the way, this naming is hypothetical.
Now, if you add more content to myfile.docx and save it, it will need to occupy more blocks on the storage medium.
Since myfile.docx is currently stored on LBA250 , LBA251 , and LBA252 , the new content should preferably sit within LBA253 and so forth — depending on how much space is needed.
However, if LBA253 is already taken by another file (maybe it’s the first block of another file), the remaining content of myfile.docx has to be stored on different blocks somewhere else on the disks, for instance, LBA312 and LBA313 .
File fragmentation puts a burden on the file system because every time a fragmented file is requested by a user program, the file system needs to collect every piece of the file from various locations on disk.
The applies to saving the file back to the disk as well.
The fragmentation might also occur when a file is written to the disk for the first time, probably because the file is huge, and not much space is left on the storage device.
Fragmentation is one of the reasons some operating systems get slow as the file system ages.
Should We Care About Fragmentation these days?
The short answer is: not anymore!
Modern file system use smart algorithms to avoid (or early detect) fragmentation as much as possible.
Ext4 also does some sort of preallocation, which involves reserving blocks for a file before they are actually needed — making sure the file won’t get fragmented if it get bigger over time.
The number of the preallocated blocks is defined in the length field of the file’s extent of its inode object.
Additionally, Ext4 uses an allocation technique called delayed allocation.
The idea is instead of writing to data blocks one at a time during a write, the allocation requests are accumulated in a buffer. Finally, the allocation is done and data is written to the disk.
Not having to call the block allocator on every write request helps the file system make better choices with distributing the available space. For instance, by placing large files apart from smaller files.
Imagine that a small file is located between to large files. Now, if the small file is deleted, it leaves a small space between the two files.
If the big files and small files are kept in different areas on the storage device, when a file deletes small files won’t leave many gaps on the storage device.
Spreading the files out in this manner leaves enough gaps between data blocks, which helps the filesystem manage (and avoid) fragmentation more easily.
Delayed allocation actively reduces fragmentation and increases performance.
Directories
A Directory (Folder in Windows) is a special file used as a logical container to group files and directories.
On NTFS and Ext4, directories and files are treated the same way. That said, directories are just files that have their own inode (on Ext4) or MFT entry (on NTFS).
The inode or MFT entry of a directory contains information about that directory, as well as a collection of entries pointing to the files associated with that directory.
The files aren’t literally contained within the directory, but they are associated to the directory in a way that they appear as directory’s children in a higher level — like a file explorer program.
These entries are called directory entries. Directory entries contain file names mapped to their inode/MFT entry.
In addition to the directory entries, two other entries also exist, . , which points the directory itself, and .. , which points to its parent directory.
On Linux, you can use the ls in a directory to see the directory entries with their associated inode numbers:
And the output would be something like this:
Filename Rules
Some file systems enforce limitations on filenames.
The limitation can be on the length of the filename or filename case sensitivity.
For instance, in NTFS file system, MyFile and myfile refer to the same file, while on EXT4, they point to different files.
Why does this matter? you may ask.
Imagine that you’re creating a web page on your Windows machine. The web page contains your brand logo, a PNG file, like this:
Even if the actual file name is Logo.png (note the capital L), you can still see the image when you open your web page on your web browser.
However, once you deploy it to a Linux server, and view it live, the image would be broken.
Because in Linux (EXT4 file system) logo.png and Logo.png point to two different files.
File Size Rules
One important aspects of file systems is their supported maximum file size.
An old file system like FAT32, (used by MS-DOS +7.1, Windows 9x family, and flash memories) can’t store files more than 4 GB, while its successor, NTFS allows file sizes to be up to 16 EB (1000 TB).
Just like NTFS, exFAT allows a file size of 16 EB as well, which makes it an ideal choice for storing massive data objects, such as video files.
Practically, there’s no limitation on the file size in the exFAT and NTFS file systems.
Linux’s EXT4 and Apple’s APFS support files up to 16 TiB and 8 EiB respectively.
File Explorers
As you know, the logical layer of the file system provides an API to enable user applications to perform file operations, such as read , write , delete , and execute .
The file system’s API is a low-level mechanism, though, designed for computer programs, runtime environments, and shells — not designed for daily use.
That said, operating systems provide convenient file management utilities out of the box for your day-to-day file management. For instance, File Explorer on Windows, Finder on Mac OS, and Nautilus on Ubuntu are examples of file explorers.
These utilities use the logical file system’s API under the hood.
Apart from these GUI tools, operating systems expose the file system’s APIs via the command-line interfaces as well, like Command Prompt on Windows, and Terminal on Mac and Linux.
These text-based interfaces help users do all sorts of file operations in the form of text commands. Just like how we did in the previous examples.
File access management
Not everyone should be able to remove or modify a file they don’t own or update files if they are not authorised to do so.
Modern file systems provide mechanisms to control users’ access and capabilities with regard to files.
The data regarding user permissions and file ownership is stored in a data structure, called Access-Control List (ACL) on Windows, or Access-Control Entries (ACE) on Unix-like operating systems (Linux and Mac OS).
This feature is also available in the CLI, where a user can change file ownerships, or limit permissions of each file right from the command line interface.
For instance, a file owner (on Linux or Mac) can set a file to be available to the public, like so:
777 means everyone can do every operation (read, write, execute) on myfile.txt .
Maintaining data integrity
Let’s suppose that you’ve been working on your thesis for a month now. One day you open the file, make some modifications, and save it.
Once you save the file, your word processor program sends a “write” request to the file system’s API (the logical file system).
The request is eventually passed down to the physical layer to store the file on a number of blocks.
But what if the system crashes while the older version of the file is being replaced with the new version?
Well, in older file systems (like FAT32 or ext2) the data would be corrupted because it was partially written to the disk.
However, this is less likely to happen with modern file systems as they use a technique called journaling.
Journaling file systems record every single operation that’s about to happen in the physical layer but hasn’t happened yet.
The main purpose is to keep track of the changes that haven’t yet been committed to the file system physically.
The journal is a special allocation on disk where each write attempt is first stored as a transaction.
Once the data is physically placed on the storage device, the change is committed to the filesystem.
In case of a system failure, the file system will detect the incomplete transaction and rolls it back as if it never happened.
That said, the new content (that was being written) may still be lost, but the existing data would remain intact.
Modern file systems such as NTFS, APFS, and EXT4 (even EXT3) use journaling to avoid data corruption in case of system failure.
Database File Systems
Typical file systems organise files as directories.
To access a file, you traverse to the respective directory and you have it.
However, in a database file system, there’s no concept of paths and directories.
The database file system is a faceted system which groups files based on various attributes and dimensions.
For instance, MP3 files can be listed by artist, genre, release year, and album — at the same time!
A database file system is more like a high-level application to help you organise and access your files more easily and more efficiently. However, you won’t be able to access the files outside of this application.
A database file system cannot replace a typical file system, though. It’s just a high-level abstraction to handle files efficiently
The iTunes app on Mac OS is a good example of a database file system.
Wrapping Up
Wow! You made to the end, which means you now know a lot more about file systems. But I’m sure this won’t be the end of your file system studies.
So again — can we describe what a file system is and how it works in one sentence?
Let’s finish this post with the brief description I used at the beginning:
A file system defines how files are named, stored, and retrieved from the storage device.
If there’s something that you think is missing or that I’ve gotten wrong, please let me in the comments below.
By the way, if you like more detailed content like this, visit my web site skillupp.tech and follow me on Twitter, because these are the channels I use to share my everyday findings.
Thanks for reading and enjoy learning!
Reza Lavarian
Hey 👋 I’m a software engineer, an author, and an open-source contributor. I enjoy helping people (including myself) decode the complex side of technology. I share my findings on Twitter: @lavary_
If you read this far, tweet to the author to show them you care. Tweet a thanks
Learn to code for free. freeCodeCamp’s open source curriculum has helped more than 40,000 people get jobs as developers. Get started
freeCodeCamp is a donor-supported tax-exempt 501(c)(3) nonprofit organization (United States Federal Tax Identification Number: 82-0779546)
Our mission: to help people learn to code for free. We accomplish this by creating thousands of videos, articles, and interactive coding lessons — all freely available to the public. We also have thousands of freeCodeCamp study groups around the world.
Donations to freeCodeCamp go toward our education initiatives, and help pay for servers, services, and staff.