Disk fragmentation and disk defrag
Physical Storage Methods
A thorough treatise on physical storage methods is outside the scope of this article; however it is important to understand a few key concepts about physical storage. The most common form of physical storage today is magnetic media, such as tapes, hard drives and compact discs. All magnetic media suffers from arbitrary seek times, that is to get from one location on the media to another is not constant. This is in contrast with other media, such as certain non-volatile RAM, which can have constant seek times. Seek times are an important concept to understand when dealing with disk fragmentation, as the nature of disk fragmentation means that seek time becomes overhead when performing a data operation.
In terms of proneness to disk fragmentation, it is atypical to find an example of a fragmented magnetic tape or compact disc. This is because the write operation for these media is usually performed contiguously, and random write access is not provided per se. On the other hand, hard drives allow for random access to all storage for all operations. Random access encourages disk fragmentation because whilst read and write operations can be performed quickly, inserting, resizing, moving and copying data are not. Thus if a user had a full, defragmented hard drive alike so:
| File A | File B | File C | File D | File E |
If the user deleted files A and E, they would be left with:
| Free Space | File B | File C | File D | Free Space |
This scenario is called external fragmentation (fragmentation of free disk space). Now, if the user wanted to store a file that was the size of files A and E, they have enough space to do so, but not contiguously. To store the file contiguously one would have to shift files B, C and D into the latter free space, and then store the new file in the contiguous free space block. Unfortunately this can be cripplingly slow if the combined size of files B, C and D amount to any size of data that would cause a noticeable delay to the end user. Therefore, in the interests of speed, it is preferable to divide the new file into two pieces and write them to two different locations on the disk, fragmenting it so that the final result looks like this:
| File F (Part One) | File B | File C | File D | File F (Part Two) |
Now that disk fragmentation is occurred (in this context it is called data fragmentation), we have an example of where seek times become overhead. To write this file, the hard drive would have had to have performed a seek operation at the end of part one before it could begin writing to part two. In the real world, for an average sized file with two fragments there is very little performance penalty, but this is a simple example. It is more likely that the worst cases of disk fragmentation occur on servers, which deal with random operations on arbitrary sized files continuously, and it is not atypical to find a large file (such as a relational database) that has thousands of fragments. Typical hard drives have a seek time of around 6ms on average, which potentially means several minutes' delay in an operation. Other candidate files for high fragmentation include paging files used by operating systems. Again, this can mean a significant delay for the end user when performing intensive operations on his or her machine; all caused by file fragmentation and seek times.
This example overlooks one key factor, which is the mechanism by which the hard drive determines where the files are located and whether they are fragmented. This information is managed by a "file system" which, aside from an initial boot sector common to all file systems, is of a proprietary implementation of one of the end user's operating systems. These file systems are discussed in the next section.
Virtual Storage Methods
Again, a detailed explanation is outside the scope of this article, so instead a brief explanation is offered. Virtual storage methods are, in the most basic sense, structured encapsulation of the data on a storage medium. Typically a virtual storage method is referred to as a "file system", and there are different file systems for different types of storage media. For compact discs, there are ISO and UDF, and for hard drives there are (amongst others) FAT32 and NTFS. For this explanation, we will look at NTFS (although the concepts are transferable).
On an NTFS-formatted disk, the data is laid out alike so:
| Boot Sector | Master File Table | System Files | File Storage |
The boot sector is a small data segment common to nearly all file systems, because it is necessary for the boot process to occur off of the hard drive. File systems without boot sectors are rare. Immediately following the boot sector (again this is common in most file systems) is the master file table, which stores information about files and directories, as well as where the files are stored and how they are fragmented. Following the master file table is space allocated for system files. In NTFS, system files are important because they store information about free space, "bad" parts of the disk and other, implementation-specific data.
The remaining space on the disk is allocated to files, which is individually partitioned into very small units called "sectors". Because each sector is typically very small, when storing large files a "cluster" is used. Cluster sizes are always multiples of sector sizes, and a cluster is always a contiguous collection of sectors. Cluster and sector sizes are important when dealing with internal fragmentation, which is discussed in the next section.
Disk Fragmentation
Disk fragmentation is an unfortunate problem with today's mass storage mediums, which occurs as a result of the inability of storage media to rapidly copy or move data relative to writing, or store data with precise granularity. The following section of this article discusses disk fragmentation in a general sense, as well as the types of file fragmentation that can occur on a file system.
Causes of disk fragmentation
Disk fragmentation, in all cases, is caused by one of two things. The first (and most degenerative to performance) is the inability of the storage medium to provide contiguous storage at that point in time as opposed to fragmented storage. This inability causes two types of disk fragmentation, external and data fragmentation. The other cause of fragmentation occurs when the file system is unable to store data with fine enough granularity such that all space is utilized. This type of disk fragmentation is less degenerative to performance, but instead results in wasted space.
Types of disk fragmentation
External fragmentation
External fragmentation occurs to the free space on the hard drive, after data has been removed. Ideally, free space would be kept contiguously, so that data can be written to the medium contiguously and without fragmentation. However, if segments of data are removed at arbitrary places along the storage medium, then the free space is automatically fragmented. Ideally, a storage medium would be able to shuffle files into that free space and reallocate the free space back to a "heap" of it, however this is time consuming. This can be seen in defragmentation software, which utilize significant amounts of time to defragment heavily fragmented disks.
Data fragmentation
Data fragmentation occurs due to external fragmentation. If a storage medium has enough free space to store data, but the drive is externally fragmented, this means that the data may be fragmented well. This is unfortunate also in that many of the caveats shared by external fragmentation are shared by data fragmentation, primarily that to prevent it in real-time is debilitating to performance more so than the fragmentation, and the correction of it is very time consuming.
Internal fragmentation
Internal fragmentation occurs as a failure of the file system to store the file with enough precise enough a granularity such that all available free space is utilized. This occurs because reduced granularity allows for increased performance or simplicity when implementing the file system. Internal fragmentation on most computer systems is not usually considered a problem, because optimizing it invokes only small performances benefits, and involves modifying the file system, which can lead to a host of compatibility issues.
In general it is accepted that the benefits of internal fragmentation versus the rewards of allowing it and the penalties of removing it make internal fragmentation permissible.
Why defragment?
Because the defragmentation process can take considerable amounts of time, to many uneducated users and consumers it may seem like a waste. However, it is important to consider two factors when arguing the case for defragmentation: performance benefits, and hardware benefits.
Performance benefits resulting from a complete defragmentation may not be all that apparent to the average home user. This is because when dealing with the data the average home user deals with performance from file fragmentation is not degraded by perhaps more than 5%. However, for servers and applications dealing with large data files, files that are fragmented thousands of times can result in server or application instability, as the code behind the entity is held up waiting for file operations to complete.
Hardware benefits resulting from a complete defragmentation are also not immediately apparent to users, however it is important to remember that external and data fragmentation cause extraneous seek operations to be performance. These operations deteriorate the media seek heads, and in some cases deteriorate the storage medium (e.g. seeking on a tape puts stress on it). Defragmentation results in storage media that are more robust and resistant to data failure, which is important to both consumer and industry professional.
Back to PC Mesh Defrag for Windows
|