- Create, write, and read a file
- Prerequisites
- Creating a file
- Writing to a file
- Reading from a file
- WriteFile function (fileapi.h)
- Syntax
- Parameters
- Return value
- Remarks
- Synchronization and File Position
- Pipes
- The basics of file systems
- What is a file system?
- File systems of Windows
- File systems of macOS
- File systems of Linux
- ReiserFS
- Btrfs
- File systems of BSD, Solaris, Unix
- Clustered file systems
Create, write, and read a file
Important APIs
Read and write a file using a StorageFile object.
В For a complete sample, see the File access sample.
Prerequisites
Understand async programming for Universal Windows Platform (UWP) apps
You can learn how to write asynchronous apps in C# or Visual Basic, see Call asynchronous APIs in C# or Visual Basic. To learn how to write asynchronous apps in C++/WinRT, see Concurrency and asynchronous operations with C++/WinRT. To learn how to write asynchronous apps in C++/CX, see Asynchronous programming in C++/CX.
Know how to get the file that you want to read from, write to, or both
You can learn how to get a file by using a file picker in Open files and folders with a picker.
Creating a file
Here’s how to create a file in the app’s local folder. If it already exists, we replace it.
Writing to a file
Here’s how to write to a writable file on disk using the StorageFile class. The common first step for each of the ways of writing to a file (unless you’re writing to the file immediately after creating it) is to get the file with StorageFolder.GetFileAsync.
Writing text to a file
Write text to your file by calling the FileIO.WriteTextAsync method.
Writing bytes to a file by using a buffer (2 steps)
First, call CryptographicBuffer.ConvertStringToBinary to get a buffer of the bytes (based on a string) that you want to write to your file.
Then write the bytes from your buffer to your file by calling the FileIO.WriteBufferAsync method.
Writing text to a file by using a stream (4 steps)
First, open the file by calling the StorageFile.OpenAsync method. It returns a stream of the file’s content when the open operation completes.
Next, get an output stream by calling the IRandomAccessStream.GetOutputStreamAt method from the stream . If you’re using C#, then enclose this in a using statement to manage the output stream’s lifetime. If you’re using C++/WinRT, then you can control its lifetime by enclosing it in a block, or setting it to nullptr when you’re done with it.
Now add this code (if you’re using C#, within the existing using statement) to write to the output stream by creating a new DataWriter object and calling the DataWriter.WriteString method.
Lastly, add this code (if you’re using C#, within the inner using statement) to save the text to your file with DataWriter.StoreAsync and close the stream with IOutputStream.FlushAsync.
Best practices for writing to a file
For additional details and best practice guidance, see Best practices for writing to files.
Reading from a file
Here’s how to read from a file on disk using the StorageFile class. The common first step for each of the ways of reading from a file is to get the file with StorageFolder.GetFileAsync.
Reading text from a file
Read text from your file by calling the FileIO.ReadTextAsync method.
Reading text from a file by using a buffer (2 steps)
Then use a DataReader object to read first the length of the buffer and then its contents.
Reading text from a file by using a stream (4 steps)
WriteFile function (fileapi.h)
Writes data to the specified file or input/output (I/O) device.
This function is designed for both synchronous and asynchronous operation. For a similar function designed solely for asynchronous operation, see WriteFileEx.
Syntax
Parameters
A handle to the file or I/O device (for example, a file, file stream, physical disk, volume, console buffer, tape drive, socket, communications resource, mailslot, or pipe).
The hFile parameter must have been created with the write access. For more information, see Generic Access Rights and File Security and Access Rights.
For asynchronous write operations, hFile can be any handle opened with the CreateFile function using the FILE_FLAG_OVERLAPPED flag or a socket handle returned by the socket or accept function.
A pointer to the buffer containing the data to be written to the file or device.
This buffer must remain valid for the duration of the write operation. The caller must not use this buffer until the write operation is completed.
The number of bytes to be written to the file or device.
A value of zero specifies a null write operation. The behavior of a null write operation depends on the underlying file system or communications technology.
Windows ServerВ 2003 and WindowsВ XP:В В Pipe write operations across a network are limited in size per write. The amount varies per platform. For x86 platforms it’s 63.97 MB. For x64 platforms it’s 31.97 MB. For Itanium it’s 63.95 MB. For more information regarding pipes, see the Remarks section.
A pointer to the variable that receives the number of bytes written when using a synchronous hFile parameter. WriteFile sets this value to zero before doing any work or error checking. Use NULL for this parameter if this is an asynchronous operation to avoid potentially erroneous results.
This parameter can be NULL only when the lpOverlapped parameter is not NULL.
For more information, see the Remarks section.
A pointer to an OVERLAPPED structure is required if the hFile parameter was opened with FILE_FLAG_OVERLAPPED, otherwise this parameter can be NULL.
For an hFile that supports byte offsets, if you use this parameter you must specify a byte offset at which to start writing to the file or device. This offset is specified by setting the Offset and OffsetHigh members of the OVERLAPPED structure. For an hFile that does not support byte offsets, Offset and OffsetHigh are ignored.
To write to the end of file, specify both the Offset and OffsetHigh members of the OVERLAPPED structure as 0xFFFFFFFF. This is functionally equivalent to previously calling the CreateFile function to open hFile using FILE_APPEND_DATA access.
For more information about different combinations of lpOverlapped and FILE_FLAG_OVERLAPPED, see the Remarks section and the Synchronization and File Position section.
Return value
If the function succeeds, the return value is nonzero (TRUE).
If the function fails, or is completing asynchronously, the return value is zero (FALSE). To get extended error information, call the GetLastError function.
Remarks
The WriteFile function returns when one of the following conditions occur:
- The number of bytes requested is written.
- A read operation releases buffer space on the read end of the pipe (if the write was blocked). For more information, see the Pipes section.
- An asynchronous handle is being used and the write is occurring asynchronously.
- An error occurs.
The WriteFile function may fail with ERROR_INVALID_USER_BUFFER or ERROR_NOT_ENOUGH_MEMORY whenever there are too many outstanding asynchronous I/O requests.
To cancel all pending asynchronous I/O operations, use either:
- CancelIo—this function cancels only operations issued by the calling thread for the specified file handle.
- CancelIoEx—this function cancels all operations issued by the threads for the specified file handle.
Use the CancelSynchronousIo function to cancel pending synchronous I/O operations.
I/O operations that are canceled complete with the error ERROR_OPERATION_ABORTED.
The WriteFile function may fail with ERROR_NOT_ENOUGH_QUOTA, which means the calling process’s buffer could not be page-locked. For more information, see SetProcessWorkingSetSize.
If part of the file is locked by another process and the write operation overlaps the locked portion, WriteFile fails.
When writing to a file, the last write time is not fully updated until all handles used for writing have been closed. Therefore, to ensure an accurate last write time, close the file handle immediately after writing to the file.
Accessing the output buffer while a write operation is using the buffer may lead to corruption of the data written from that buffer. Applications must not write to, reallocate, or free the output buffer that a write operation is using until the write operation completes. This can be particularly problematic when using an asynchronous file handle. Additional information regarding synchronous versus asynchronous file handles can be found later in the Synchronization and File Position section and Synchronous and Asynchronous I/O.
Note that the time stamps may not be updated correctly for a remote file. To ensure consistent results, use unbuffered I/O.
The system interprets zero bytes to write as specifying a null write operation and WriteFile does not truncate or extend the file. To truncate or extend a file, use the SetEndOfFile function.
Characters can be written to the screen buffer using WriteFile with a handle to console output. The exact behavior of the function is determined by the console mode. The data is written to the current cursor position. The cursor position is updated after the write operation. For more information about console handles, see CreateFile.
When writing to a communications device, the behavior of WriteFile is determined by the current communication time-out as set and retrieved by using the SetCommTimeouts and GetCommTimeouts functions. Unpredictable results can occur if you fail to set the time-out values. For more information about communication time-outs, see COMMTIMEOUTS.
Although a single-sector write is atomic, a multi-sector write is not guaranteed to be atomic unless you are using a transaction (that is, the handle created is a transacted handle; for example, a handle created using CreateFileTransacted). Multi-sector writes that are cached may not always be written to the disk right away; therefore, specify FILE_FLAG_WRITE_THROUGH in CreateFile to ensure that an entire multi-sector write is written to the disk without potential caching delays.
If you write directly to a volume that has a mounted file system, you must first obtain exclusive access to the volume. Otherwise, you risk causing data corruption or system instability, because your application’s writes may conflict with other changes coming from the file system and leave the contents of the volume in an inconsistent state. To prevent these problems, the following changes have been made in WindowsВ Vista and later:
- A write on a volume handle will succeed if the volume does not have a mounted file system, or if one of the following conditions is true:
- The sectors to be written to are boot sectors.
- The sectors to be written to reside outside of file system space.
- You have explicitly locked or dismounted the volume by using FSCTL_LOCK_VOLUME or FSCTL_DISMOUNT_VOLUME.
- The volume has no actual file system. (In other words, it has a RAW file system mounted.)
- A write on a disk handle will succeed if one of the following conditions is true:
- The sectors to be written to do not fall within a volume’s extents.
- The sectors to be written to fall within a mounted volume, but you have explicitly locked or dismounted the volume by using FSCTL_LOCK_VOLUME or FSCTL_DISMOUNT_VOLUME.
- The sectors to be written to fall within a volume that has no mounted file system other than RAW.
There are strict requirements for successfully working with files opened with CreateFile using FILE_FLAG_NO_BUFFERING. For details see File Buffering.
If hFile was opened with FILE_FLAG_OVERLAPPED, the following conditions are in effect:
- The lpOverlapped parameter must point to a valid and unique OVERLAPPED structure, otherwise the function can incorrectly report that the write operation is complete.
- The lpNumberOfBytesWritten parameter should be set to NULL. To get the number of bytes written, use the GetOverlappedResult function. If the hFile parameter is associated with an I/O completion port, you can also get the number of bytes written by calling the GetQueuedCompletionStatus function.
In Windows ServerВ 2012, this function is supported by the following technologies.
Technology | Supported |
---|---|
Server Message Block (SMB) 3.0 protocol | Yes |
SMB 3.0 Transparent Failover (TFO) | Yes |
SMB 3.0 with Scale-out File Shares (SO) | Yes |
Cluster Shared Volume File System (CsvFS) | Yes |
Resilient File System (ReFS) | Yes |
В
Synchronization and File Position
Pipes
If the pipe buffer is full when an application uses the WriteFile function to write to a pipe, the write operation may not finish immediately. The write operation will be completed when a read operation (using the ReadFile function) makes more system buffer space available for the pipe.
When writing to a non-blocking, byte-mode pipe handle with insufficient buffer space, WriteFile returns TRUE with *lpNumberOfBytesWritten —>
The basics of file systems
Presently, the computer market offers a huge variety of opportunities for storing information in the digital form. Existing storage devices include internal and external hard drives, memory cards of photo/video cameras, USB flash drives, RAID sets along with other complex storage systems. Pieces of data are kept on them in the form of files, like documents, pictures, databases, email messages, etc. which have to be efficiently organized on the disk and easily retrieved when needed.
The following article provides a general overview of the file system, the major means of data management on any storage, and describes the peculiarities of different file system types.
What is a file system?
Any computer file is stored on a storage medium with a given capacity. In actual fact, each storage is linear space for reading or both reading and writing digital information. Each byte of information on it has its offset from the storage start known as an address and is referenced by this address. A storage can be presented as a grid with a set of numbered cells (each cell is a single byte). Any file saved to the storage gets its own cells.
Generally, computer storages use the pair of a sector and in-sector offset to reference any byte of information on the storage. A sector is a group of bytes (usually 512 bytes), a minimum addressable unit of the physical storage. For example, byte 1040 on a hard disk will be referenced as a sector #3 and offset in sector 16 bytes ([sector]+[sector]+[16 bytes]). This scheme is applied to optimize storage addressing and to use a smaller number to refer to any portion of information located on the storage.
To omit the second part of the address (in-sector offset), files are usually stored starting from the sector start and occupy whole sectors (e.g.: a 10-byte file occupies the whole sector, a 512-byte file also occupies the whole sector, at the same time, a 514-byte file occupies two entire sectors).
Each file is stored in «unused» sectors and can be read later by its known position and size. However, how do we know which sectors are occupied and which are free? Where are the size, position and name of the file stored? This is exactly what the file system is responsible for.
As a whole, a file system is a structured representation of data and a set of metadata describing this data. It is applied to the storage during the format operation. A file system serves for the purposes of the whole storage and is also a part of an isolated storage segment – a disk partition. Usually, a file system operates blocks, not sectors. File system blocks are groups of sectors that optimize storage addressing. Modern file systems generally use block sizes from 1 to 128 sectors (512-65536 bytes). Files are usually stored at the start of a block and take up entire blocks.
Constant write/delete operations in the file system cause its fragmentation. Thus, files are not stored as whole units, but get divided into fragments. For example, a storage is completely occupied by files with the size of about 4 blocks each (e.g. a collection of photos). A user wants to store a file that will take up 8 blocks and therefore deletes the first and the last files. By doing this, he or she frees the space of 8 blocks, however, the first segment is located near to the storage start while the second one – to the storage end. In this case, the 8-block file is split into two parts (4 blocks for each part) and takes the free space «holes». The information about both fragments as parts of a single file is stored in the file system.
In addition to user’s files, the file system also contains its own parameters (such as a block size), file descriptors (including file size, file location, its fragments, etc.), file names and directory hierarchy. It may also store security information, extended attributes and other parameters.
To comply with diverse users’ requirements, such as storage performance, stability and reliability, plenty of file systems are developed to be able to serve different purposes more effectively.
File systems of Windows
Microsoft Windows employs two major file systems: NTFS, the primary format most modern versions of this OS use by default, and FAT, which was inherited from old DOS and has exFAT as its later extension. The ReFS file system was also introduced by Microsoft as a new generation file system for server computers starting from Windows Server 2012. The HPFS file system developed by Microsoft together with IBM can be found only on extremely old machines running Windows NT up to 3.5.
FAT (File Allocation Table) is one of the simplest file system types, which has been around since the 1980s. It consists of the file system descriptor sector (boot sector or superblock), the file system block allocation table (referred to as the File Allocation Table) and plain storage space for storing files and folders. Files in FAT are stored in directories. Each directory is an array of 32-byte records, each defining a file or extended attributes of a file (e.g. a long file name). A file record attributes the first block of a file. Any next block can be found through the block allocation table by using it as a linked list.
The block allocation table contains an array of block descriptors. A zero value indicates that the block is not used and a non-zero one relates to the next block of a file or a special value for the file end.
The numbers in FAT12, FAT16, FAT32 stand for the number of bits used to enumerate a file system block. This means that FAT12 can use up to 4096 different block references, while FAT16 and FAT32 can use up to 65536 and 4294967296 accordingly. The actual maximum count of blocks is even less and depends on the implementation of the file system driver.
FAT12 and FAT16 used to be applied to old floppy disks and do not find extensive employment nowadays. FAT32 is still widely used for memory cards and USB sticks. The system is supported by smartphones, digital cameras and other portable devices.
FAT32 can be used on Windows-compatible external storages or disk partitions with the size under 32 GB (Windows cannot create a FAT32 file system which would be larger than 32 GB, although Linux supports the size up to 2 TB) and doesn’t allow creating files the size of which exceeds 4 GB. To address this issue, exFAT was introduced, which doesn’t have any realistic limitations concerning the size of files or partitions and is frequently utilized on modern external hard drives and SSDs.
NTFS (New Technology File System) was introduced in 1993 with Windows NT and is currently the most common file system for end user computers based on Windows. Most operating systems of the Windows Server line use this format as well.
The file system is quite reliable thanks to journaling and supports many features, including access control, encryption, etc. Each file in NTFS is stored as a file descriptor in the Master File Table and file content. The Master file table contains entries with all information about files: size, allocation, name, etc. The first 16 entries of the Master File Table are retained for the BitMap, which keeps record of all free and used clusters, the Log used for journaling records and the BadClus containing information about bad clusters. The first and the last sectors of the file system contain file system settings (the boot record or the superblock). This file system uses 48 and 64 bit values to reference files, thus being able to support data storages with extremely high capacity.
ReFS (Resilient File System) is the latest development of Microsoft introduced with Windows 8 and now available for Windows 10. The file system architecture absolutely differs from other Windows file systems and is mainly organized in a form of the B+-tree. ReFS has high tolerance to failures due to new features included into the system. The most noteworthy one among them is Copy-on-Write (CoW): no metadata is modified without being copied; data is not written over the existing data – it is placed to another area on the disk. After any file modifications, a new copy of metadata is saved to a free area on the storage, and then the system creates a link from older metadata to the newer copy. Thus, the system stores a significant quantity of older backups in different places providing easy file recovery unless this storage space is overwritten.
HPFS (High Performance File System) was created by Microsoft in cooperation with IBM and introduced with OS/2 1.20 in 1989 as a file system for servers that could provide much better performance when compared to FAT. In contrast to FAT, a file system which simply allocates any first free cluster on the disk for the file fragment, HPFS seeks to arrange the file in contiguous blocks, or at least ensure that its fragments (referred to as extents) are placed maximally close to each other. At the beginning of HPFS, there are three control blocks occupying 18 sectors: the boot block, the super block and the spare block. The remaining storage space is divided into parts of contiguous sectors referred to as bands taking 8 MB each. A band has its own sector allocation bitmap showing which sectors in it are occupied (1 – taken, 0 – free). Each file and directory has its own F-Node located close to it on the disk – this structure contains the information about the location of a file and its extended attributes. A special directory band located in the center of the disk is used for storing directories while the directory structure itself is a balanced tree with alphabetical entries.
Hint: The information concerning data recovery perspectives of the file systems used by Windows can be found in the articles on data recovery specificities of different OS and chances for data recovery. For detailed instructions and recommendations, please, read the manual devoted to data recovery from Windows.
File systems of macOS
Apple’s macOS applies two file systems: HFS+, an extension to their legacy HFS file system used on old Macintosh computers, and APFS, a format employed by modern Macs running macOS 10.14 and later.
HFS+ used to be the primary file system of Apple desktop products, including Mac computers, iPods, as well as Apple X Server products before it was replaced by APFS in macOS High Sierra. Advanced server products also use Apple Xsan file system, a clustered file system derived from StorNext and CentraVision.
The HFS+ file system uses B-trees for placing and locating files. Volumes are divided into sectors, typically 512 bytes in size, which are then grouped into allocation blocks, the number of which depends on the size of the entire volume. The information concerning free and used allocation blocks is kept in the Allocation File. All allocation blocks assigned to each file as extends are recorded in the Extends Overflow File. And, finally, all file attributes are listed in the Attributes file. Data reliability is improved through journaling which makes it possible to keep track of all changes to the system and quickly return it back to the working state in case of unexpected events. Among other supported features are hard links to directories, logical volume encryption, access control, data compression, etc.
The Apple file system is aimed to address fundamental issues present in its predecessor and was developed to efficiently work with modern flash storages and solid-state drives. This 64-bit file system uses the copy-on-write method to increase performance, which allows to copy each block before the changes to it are applied, and offers a lot of data integrity and space-saving features. All the file contents and metadata about files, folders along with other APFS structures are kept in the APFS container. The Container Superblock stores information about the number of blocks in the Container, the block size, etc. Information about all allocated and free blocks of the Container is managed with the help of Bitmap Structures. Each volume in the Container has its own Volume Superblock which provides information about this volume. All files and folders of the volume are recorded in the File and Folder B-Tree, while the Extents B-Tree is responsible for extents – references to file contents (file start, its length in blocks).
Hint: The details related to the possibility of data recovery from these file systems can be found in the articles about the peculiarities of data recovery depending on the operating system and chances for data recovery. If you’re interested in the practical side of the procedure, please, refer to the guide on data recovery from macOS.
File systems of Linux
Open-source Linux aims at implementing, testing and using different types of file systems. The most popular Linux file systems include:
Ext2, Ext3, Ext4 are simply different versions of the «native» Linux Ext file system. This file system falls under active developments and improvements. Ext3 file system is just an extension of Ext2 that uses transactional file writing operations with a journal. Ext4 is a further development of Ext3, extended with the support of optimized file allocation information (extents) and extended file attributes. This file system is frequently used as a «root» file system for most Linux installations.
ReiserFS
ReiserFS — an alternative Linux file system optimized for storing a huge number of small files. It has a good capability for files search and enables compact allocation of files by storing tails of files or simply very small files along with metadata in order to avoid using large file system blocks for this purpose. However, this file system is no longer actively developed and supported.
XFS — a robust journaling file system that was initially created by Silicon Graphics and used by the company’s IRIX servers. In 2001, it made its way to the Linux kernel and is now supported by most Linux distributions, some of which, like Red Hat Enterprise Linux, even use it by default. This file system is optimized for storing very big files and file systems on a single host.
JFS — a file system developed by IBM for the company’s powerful computing systems. JFS1 usually stands for JFS, JFS2 is the second release. Currently, this file system is open-source and implemented in most modern Linux versions.
Btrfs
Btrfs — a file system based on the copy-on-write principle (COW) that was designed by Oracle and has been supported by the mainline Linux kernel since 2009. Btrfs embraces the features of a logical volume manager, being able to span multiple devices, and offers much higher fault tolerance, better scalability, easier administration, etc. together with a number of advanced possibilities.
F2FS – a Linux file system designed by Samsung Electronics that is adapted to the specifics of storage devices based on the NAND flash memory that are widely used in modern smartphones and other computing systems. The file system works on the basis of the log-structured file system approach (LFS) and takes into account such peculiarities of flash storage as constant access time and a limited number of data rewriting cycles. Instead of creating one large chunk for writing, F2FS assembles the blocks into separate chunks (up to 6) that are written concurrently.
The concept of «hard links» used in this kind of operating systems makes most Linux file systems similar in that the file name is not regarded as a file attribute and rather defined as an alias for a file in a certain directory. A file object can be linked from many locations, even multiply from the same directory under different names. This can lead to serious and even insurmountable difficulties in recovery of file names after file deletion or file system damage.
Hint: The information concerning the possibility of successful recovery of data from the mentioned file systems can be found in the articles describing the specifics of data recovery from different operating systems and chances for data recovery. To get a grasp on how the procedure should be carried out, please, use the manual on data recovery from Linux.
File systems of BSD, Solaris, Unix
The most common file system for these operating systems is UFS (Unix File System) also often referred to as FFS (Fast File System).
Currently, UFS (in different editions) is supported by all Unix-family operating systems and is a major file system of the BSD OS and the Sun Solaris OS. Modern computer technologies tend to implement replacements for UFS in different operating systems (ZFS for Solaris, JFS and derived file systems for Unix etc.).
Hint: The information about the likelihood of a successful result when it comes to data recovery from these file systems can be found in the articles about OS-specific peculiarities of data recovery and chances for data recovery. The process itself is described in the instruction dedicated to data recovery from Unix, Solaris and BSD.
Clustered file systems
Clustered file systems are used in computer cluster systems. These file systems support distributed storage.
Distributed file systems include:
ZFS – Sun company «Zettabyte File System» — a file system developed for distributed storages of Sun Solaris OS.
Apple Xsan – the Apple company evolution of CentraVision and later StorNext file systems.
VMFS – «Virtual Machine File System» developed by VMware company for its VMware ESX Server.
GFS – Red Hat Linux «Global File System«.
JFS1 – the original (legacy) design of IBM JFS file system used in older AIX storage systems.
Common properties of these file systems include distributed storages support, extensibility and modularity.
To learn about other technologies used to store and manipulate data, please, refer to the storage technologies section.