Fragmentation

The FAT file system does not contain built-in mechanisms which prevent newly written files from becoming scattered
across the partition. On volumes where files are created and deleted frequently or their lengths often changed, the
medium will become increasingly fragmented over time.

While the design of the FAT file system does not cause any organizational overhead in disk structures or reduce the
amount of free storage space with increased amounts of fragmentation, as it occurs with external fragmentation, the
time required to read and write fragmented files will increase as the operating system will have to follow the cluster
chains in the FAT (with parts having to be loaded into memory first in particular on large volumes) and read the
corresponding data physically scattered over the whole medium reducing chances for the low-level vlock device driver
to perform multi-sector disk I/O or initiate larger DMA transfers, thereby effectively increasing I/O protocol overhead
as well as arm movement and head settle times inside the disk drive. Also, file operations will become slower with
growing fragmentation as it takes increasingly longer for the operating system to find files or free clusters.

Other file systems such as HPFS or exFAT, use free space bitmaps that indicate used and available clusters, which could
then be quickly looked up in order to find free contiguous areas. Another solutions is the linkage of all free clusters
into one or more lists (as is done in Unix systems). Instead, the FAT has to be scanned as an array to find free
clusters, which can lead to performance penalties with large disks.

In fact, seeking for files in large subdirectories or computing the free disk space on FAT volumes is one of the most
resource intensive operations, as it requires reading the directory tables or even the entire FAT linearly. Since the
total amount of clusters and the size of their entries in the FAT was still small on FAT12/16 volumes, this could be tolerated most of the time, considering that the introduction of more sophisticated disk structures would have also
increased the complexity and memory footprint of real-mode operating systems with their minimum total memory requirements
of 128KiB or less (such as with DOS) for which FAT has been designed and optimized originally.

With the introduction of FAT32, long seek and scan times became more apparent, particularly on very large volumes. A
possible justification for limiting the maximum size of FAT32 partitions was the time required to perform a 'DIR'
operation, which always displays the free disk space as the last line. Displaying this line took longer and longer as the
number of clusters increased. FAT32 therefore introduced a special file system information sector where the previously
computed amount of free space is preserved over power cycles, so that the free space counter needs to be recalculated
only when a removable FAT32 formatted medium gets ejected without first unmounting it or if the system is switched off
without properly shutting down the operating system, a problem mostly visible with pre-ATX style PCs, on plain DOS
systems and some battery-powered consumer products.

Huge cluster sizes can cause internal fragmentation, since files are rarely exact multiples of cluster size. This problem
is worse with a large number of small files.

Various optimizations and tweaks to the implementation of FAT file system drivers, block device drivers, and disk tools
have been devised to overcome most of the performance bottlenecks in the file system's inherent design without having to
change the layout of the on-disk structures. They can be divided into on-line and off-line methods and work by trying
to avoid fragmentation in the file system in the first place, deploying methods to better cope with existing
fragmentation, and by reordering and optimizing the on-disk structures. With optimiziations in place, the performance on
FAT volumes can often reach that of more sophisticated file systems in practical scenarios, while at the same time
retaining the advantage of being accessible even on very small or old systems.

DOS 3.0 and higher will not immediately reuse disk space of deleted files for new allocations but instead seek for
previously unused space before starting to use disk space of previously deleted files as well. This not only helps to
maintain the integrity of deleted files for as long as possible but also speeds up file allocations and avoids
fragmentation, since never before allocated disk space is always unfragmented. DOS accomplishes this by keeping a pointer
to the last allocated cluster on each mounted volume in memory and starts searching for free space from this location
upwards instead of at the beginning of the FAT, as it was still done by DOS 2.x. If the end of the FAT is reached, it
would wrap around to continue the search at the beginning of the FAT until either free space has been found or the
original position has been reached again without having found free space. These pointers are initialized to point to
the start of the FATs after bootup, but on FAT32 volumes, DOS 7.1 and higher will attempt to retreive the last position
from the FS Information Sector. This mechanism is defeated, however, if an application often deletes and recreates
temporary files as the operating system would then try to maintain the integrity of void data effectively causing more
fragmentation in the end. In some DOS versions, the usage of a special API function to create temporary files can be used
to avoid this problem.

Additionally, directory entries of deleted files will be marked 0xE5 since DOS 3.0. DOS 5.0 and higher will start to
reuse these entries only when previously unused directory entries have been used up in the table and the system would
otherwise have to expand the table itself.

Since DOS 3.3 the OS provides the means to improve performance of file operations with FASTOPEN by keeping track of the
position of recently opened files or directories in various forms of lists (MS-DOS/PC DOS) or hash tables (DR-DOS), which
can reduce file seek and open times significantly. Before DOS 5.0 special care must be taken when using such mechanisms
in conjunction with disk defragmentation software bypassing the file system or disk drivers.

Windows NT will allocated disk space to files on FAT in advance, selecting large contiguous areas, but in case of a
failure, files which were being appended appear larger than they were ever written into, with a lot of random data at the
end. 

Other high-level mechanisms may read in and process larger parts or the complete FAT on startup or on demand when
needed and dynamically build up in-memory tree representations of the volume's file structures different from the on-disk
structures. This may, on volumes with many free clusters, occupy even less memory than an image of the FAT itself. In
particular on highly fragmented or filled volumes, seeks become much faster than with linear scans over the actual FAT,
even if an image of the FAT would be stored in memory. Also, operating on the logically high level of files and cluster-
chains instead of on sector or track level, it becomes possible to avoid some degree of file fragmentation in the first
place or to carry out local file defragmentation and reordering of directory entries based on their names or access
patterns in the background.

Some of the perceived problems with fragmentation of FAT file systems also result from performance limitations of the
underlying block device drivers, which becomes more visible the lesser memory is available for sector buffering and track
blocking/deblocking:

While the single-tasking DOS had provisions for multi-sector reads and track blocking/deblocking, the OS and the
traditional PC hard disk architecture (only one outstanding I/O request at a time and no DMA transfers) originally did not
contain mechanisms which could alleviate fragmentation by asynchronously prefetching next data while the application was
processing the previous chunks. Such features became available later. Later DOS versions also provided built-in support
for look-ahead sector buffering and came with dynamically loadable disk caching programs working on physical or logical
sector level, often utilizing EMS or XMS memory and sometimes providing adaptive caching strategies or even run in
protected mode through DPMS or Cloaking to increase performance by gaining direct access to the cached data in linear
memory rather than through conventional DOS APIs.

Write-behind caching was often not enabled by default with Microsoft software (if present) given the problem of data loss
in case of a power failure or crash, made easier by the lack of hardware protection between applications and the system.