LINFO

Filesystems: A Brief Introduction



The term filesystem has two somewhat different meanings, both of which are commonly used. This can be confusing to novices, but after a while the meaning is usually clear from the context.

One meaning is the entire hierarchy of directories (also referred to as the directory tree) that is used to organize files on a computer system. On Linux and Unix, the directories start with the root directory (designated by a forward slash), which contains a series of subdirectories, each of which, in turn, contains further subdirectories, etc.

A variant of this definition is the part of the entire hierarchy of directories or of the directory tree that is located on a single partition or disk. (A partition is a section of a hard disk that contains a single type of filesystem.)

The second meaning is the type of filesystem, that is, how the storage of data (i.e., files, folders, etc.) is organized on a computer disk (hard disk, floppy disk, CDROM, etc.) or on a partition on a hard disk. Each type of filesystem has its own set of rules for controlling the allocation of disk space to files and for associating data about each file (referred to as meta data) with that file, such as its filename, the directory in which it is located, its permissions and its creation date.

An example of a sentence using the word filesystem in the first sense is: "Alice installed Linux with the filesystem spread over two hard disks rather than on a single hard disk." This refers to the fact that [the entire hierarchy of directories of] Linux can be installed on a single disk or spread over multiple disks, including disks on different computers (or even disks on computers at different locations).

An example of a sentence using the second meaning is: "Bob installed Linux using only the ext3 filesystem instead of using both the ext2 and ext3 filesystems." This refers to the fact that a single Linux installation can contain one or multiple types of filesystems. One hard disk can contain one or multiple types of filesystems (each on at least one separate partition), and a filesystem of a single type can be spread across multiple hard disks.

This article is concerned primarily with filesystems in the second sense. However, because of the intimate relationship between the structure of filesystems and types of filesystems, the next section provides a quick review of (or introduction to) Linux filesystems in the first sense.

Filesystem Structure

In Linux, everything is configured as a file. This includes not only text files, images and compiled programs (also referred to as executables), but also directories, partitions and hardware device drivers.

Each filesystem (used in the first sense) contains a control block, which holds information about that filesystem. The other blocks in the filesystem are inodes, which contain information about individual files, and data blocks, which contain the information stored in the individual files.

There is a substantial difference between the way the user sees the Linux filesystem (first sense) and the way the kernel (the core of a Linux system) actually stores the files. To the user, the filesystem appears as a hierarchical arrangement of directories that contain files and other directories (i.e., subdirectories). Directories and files are identified by their names. This hierarchy starts from a single directory called root, which is represented by a "/" (forward slash).

(The meaning of root and "/" are often confusing to new users of Linux. This because each has two distinct usages. The other meaning of root is a user who has administrative privileges on the computer, in contrast to ordinary users, who have only limited privileges in order to protect system security. The other use of "/" is as a separator between directories or between a directory and a file, similar to the backward slash used in MS-DOS.)

The Filesystem Hierarchy Standard (FHS) defines the main directories and their contents in Linux and other Unix-like operating systems. All files and directories appear under the root directory, even if they are stored on different physical devices (e.g., on different disks or on different computers). A few of the directories defined by the FHS are /bin (command binaries for all users), /boot (boot loader files such as the kernel), /home (users home directories), /mnt (for mounting a CDROM or floppy disk), /root (home directory for the root user), /sbin (executables used only by the root user) and /usr (where most application programs get installed).

To the Linux kernel, however, the filesystem is flat. That is, it does not (1) have a hierarchical structure, (2) differentiate between directories, files or programs or (3) identify files by names. Instead, the kernel uses inodes to represent each file.

An inode is actually an entry in a list of inodes referred to as the inode list. Each inode contains information about a file including (1) its inode number (a unique identification number), (2) the owner and group associated with the file, (3) the file type (for example, whether it is a regular file or a directory), (4) the file's permission list, (5) the file creation, access and modification times, (6) the size of the file and (7) the disk address (i.e., the location on the disk where the file is physically stored).

The inode numbers for the contents of a directory can be seen by using the -i option with the familiar ls (i.e., list) command in a terminal window:

ls -i

The df command is used to show information about each of the filesystems which are currently mounted on (i.e., connected to) a system, including their allocated maximum size, the amount of disk space they are using, the percentage of their disk space they are using and where they are mounted (i.e., the mountpoint). (Here filesystems is used as a variant of the first meaning, referring to the parts of the entire hierarchy of directories.)

df can be used by itself, but it is often more convenient to add the -m option to show sizes in megabytes rather than in the default kilobytes:

df -m

A column showing the type of each of these filesystems can be added to the filesystem table produced by the above command by using the --print-type option, i.e.:

df -m --print-type

This command generates a column labeled Type. For a Red Hat Linux installation on a home computer most of the entries in this column will probably be ext3 and/or ext2.

Linux Native Filesystems

Every native Linux filesystem implements a basic set of common concepts that were derived from those originally developed for Unix. (Native means that the filesystems were either developed originally for Linux or were first developed for other operating systems and then rewritten so that they would have functions and performance on Linux comparable or superior to those of filesystems originally developed for Linux.)

Several Linux native filesystems are currently in widespread use, including ext2, ext3, ReiserFS, JFS and XFS. Additional native filesystems are in various stages of development.

These filesystems differ from the DOS/Windows filesystems in a number of ways including (1) allowing important system folders to span multiple partitions and multiple hard drives, (2) adding additional information about files, including ownership and permissions and (3) establishing a number of standard folders for holding important components of the operating system.

Linux's first filesystem was minix, which was borrowed from the Minix OS. Linus Torvalds adopted this filesystem because it was an efficient and relatively bug-free piece of existing software that postponed the need to design a new filesystem from scratch.

However, minix was not well suited for use on Linux hard disks for several reasons, including its maximum partition size of only 64MB, its short filenames and its single timestamp. But minix can be useful for floppy disks and RAM disks because its low overhead can sometimes allow more files to be stored than is possible with other Linux filesystems.

The Extended File System, ext, was introduced in April, 1992. With a maximum partition size of 2GB and a maximum file name size of 255 characters, it removed the two biggest minix limitations. However, there still was no support for the separate access, inode modification and data modification timestamps. Also, its use of linked lists to keep track of free blocks and inodes caused the lists to become unsorted and the filesystem to become fragmented.

The Second Extended File System (ext2) was released in January, 1993. It was a rewrite of ext which features (1) improved algorithms that greatly improved its speed, (2) additional date stamps (such as date of last access, date of last inode modification and date of last data modification) and (3) the ability to track the state of the filesystem. Ext2 maintains a special field in the superblock that indicates the status of the filesystem as either clean or dirty. A dirty filesystem will trigger a utility to scan the filesystem for errors. Ext2 also features support for a maximum file size of 4TB (1 terabyte is 1024 gigabytes). Consequently, it has completely superseded ext, support for which has been removed from the Linux kernel.

Ext2 is the most portable of the native Linux filesystems because drivers and other tools exist that allow accessing ext2 data from a number of other operating systems. However, as useful as these tools are, most of them have limitations, such as being access utilities rather than true drivers, not working with the most recent versions of ext2, not being able to write to ext2 or posing a risk of causing filesystem corruption when writing to ext2.

Journaling Filesystems

The lack of a journaling filesystem was often cited as one of the major factors holding back the widespread adoption of Linux at the enterprise level. However, this objection is no longer valid, as there are now four such filesystems from which to choose.

Journaling filesystems offer several important advantages over static filesystems, such as ext2. In particular, if the system is halted without a proper shutdown, they guarantee consistency of the data and eliminate the need for a long and complex filesystem check during rebooting. The term journaling derives its name from the fact that a special file called a journal is used to keep track of the data that has been written to the hard disk.

In the case of conventional filesystems, disk checks during rebooting after a power failure or other system crash can take many minutes, or even hours for large hard disk drives with capacities of hundreds of gigabytes. Moreover, if an inconsistency in the data is found, it is sometimes necessary for human intervention in order to answer complicated questions about how to fix certain filesystem problems. Such downtime can be very costly with big systems used by large organizations.

In the case of a journaling filesystem, if power supply to the computer is suddenly interrupted, a given set of updates will have either been fully committed to the filesystem (i.e., written to the hard disk), in which case there is not a problem, and the filesystem can be used immediately, or the updates will have been marked as not yet fully committed, in which case the file system driver can read the journal and fix any inconsistencies that occurred. This is far quicker than a scan of the entire hard disk, and it guarantees that the structure of the filesystem is always self-consistent. With a journaling filesystem, a computer can usually be rebooted in just a few seconds after a system crash, and although some data might be lost, at least it will not take many minutes or hours to discover this fact.

Ext3 has been integrated into the Linux kernel since version 2.4.16 and has become the default filesystem on Red Hat and some other distributions. It is basically an extension of ext2 to which a journaling capability has been added, and it provides the same high degree of reliability because of the exhaustively field-proven nature of its underlying ext2. Also featured is the ability for ext2 partitions to be converted to ext3 and vice-versa without any need for backing up the data and repartitioning. If necessary, an ext3 partition can even be mounted by an older kernel that has no ext3 support; this is because it would be seen as just another normal ext2 partition and the journal would be ignored.

ReiserFS, developed by Hans Reiser and others, was actually the first journaling filesystem added to the Linux kernel. As was the case with ext2, it was designed from the ground up for use in Linux. However, unlike ext3, it was also designed from the ground up as a journaling filesystem rather than as an add-on to an existing filesystem, and thus it is widely considered to be the most advanced of the native Linux journaling filesystems. Features include high speed, excellent stability and the ability to pack small files into less disk space than is possible with many other filesystems.

A new version of ReiserFS, designated Reiser4, was scheduled for release in the first half of 2004. It is a complete rewrite from version 3 and is said to result in major improvements in performance, including higher speeds, the ability to accommodate more CPUs, built-in encryption and ease of customization.

JFS was originally developed by IBM in the mid-1990s for its AIX Unix operating system, and it was later ported to the company's OS/2 operating system. IBM subsequently changed the licensing of the OS/2 implementation to open source, which led to its support on Linux. JFS is currently used primarily on IBM enterprise servers, and it is also a good choice for systems that multiboot Linux and OS/2.

XFS was developed in the mid-1990s by Silicon Graphics (SGI) for its 64 bit IRIX Unix servers. These servers were designed with advanced graphics processing in mind, and they feature the ability to accommodate huge files sizes. The company likewise converted XFS to open source, after which it was also adopted by Linux. Because it is a 64-bit filesystem, XFS features size limitations in the millions of terabytes (in contrast to the still generous 4TB limit of ext2).

Most Linux distributions that ship with 2.4.x and later kernels support ext2, ext3 and ReiserFS. Support for JFS has been added to the 2.4.20 and 2.5.6 kernels, and XFS was added to the 2.5.36 kernel. JFS and XFS support can be added to earlier kernels by downloading the appropriate patches from the respective websites and compiling as a module or into the kernel. Partitions can then be converted by backing up the data, creating the new filesystem and then restoring the data.

Supported Foreign Filesystems

Unlike most other operating systems, Linux supports a large number of foreign filesystems in addition to its native filesystems. This is possible because of the virtual file system layer, which was incorporated into Linux from its infancy and makes it easy to mount other filesystems. In addition to reading, foreign filesystem support also often includes writing, copying, erasing and other operations.

Among the most commonly used PC filesystems is FAT (File Allocation Table). This is the primary filesystem for MS-DOS and Microsoft Windows 95, 98 and ME, and it is also supported by Windows NT, 2000 and XP and most other operating systems. The first variant, FAT16, was Microsoft's standard filesystem until Windows 95, and the subsequent FAT32 is the standard for Windows 98 and Windows ME. Linux supports both reading from and writing to FAT16 and FAT32, and their main use on Linux is to share files with Microsoft Windows on dual-boot systems and through floppies.

FAT filesystems can not accommodate information about files such as ownership and permissions. Also, FAT16 partitions are limited to a maximum of 2GB. Although the theoretical maximum size for FAT32 partitions is 8TB, Windows 98's scandisk (disk checking utility) only supports 128GB, and Windows 2000 does not permit the creation of FAT32 disks larger than 32GB.

NTFS is Microsoft's replacement for FAT. A descendant of HPFS (the native filesystem for IBM's OS/2 operating system), NTFS's purpose was to remove the limitations of the FAT filesystem (such as poor stability) while adding new features not found in HPFS. Of the Windows operating systems, it can only be accessed by NT, 2000 and XP. Under Linux, NTFS is currently supported only in read-only mode and only on some distributions.

HFS (Hierarchical File System) is the native filesystem used on most Macintosh computers, and it is sometimes said to be "the Macintosh equivalent of FAT." However, Linux's support for HFS is not as complete as that for many other filesystems. As most Macintoshes include FAT support, it thus might be preferable in some situations to use this filesystem instead of HFS when exchanging data with Macintosh computers.

ISO 9660, released in 1988 by an industry committee called High Sierra, is the standard filesystem for CDROMs. Almost all computers with CDROM drives can read files written in ISO 9660 regardless of their operating system.

How to Select the Most Appropriate Filesystem

When installing Linux, the optimal selection of filesystem(s) depends on several factors, particularly the intended application for the computer(s) and the types of partitions on which they are to be installed. In the case of most computers for individual users, ext2 or ext3 (the default on many such systems) is usually quite adequate.

For large-scale and high performance systems, however, making the optimal selection is not as easy and it is much more important. This is because the choice of filesystems can have very noticeable effects on performance, on recovery from errors, on compatibility with other operating systems and on limitations on partition and file sizes. One generalization that can be made with regard to such systems is that it is usually advantageous to use a journaling filesystem because of the greatly reduced startup times after system crashes.

For the boot and root partitions, it can be advantageous to use an ext2 or ext3 filesystem because this will allow booting in an emergency even with an older kernel. For other Linux partitions, ext3 or ReiserFS are usually the best choices, the former where ext2 compatibility is emphasized and the latter where performance is paramount. When it is desired for partitions to be accessible to both Linux and Microsoft Windows, FAT should be selected.

The questions of which filesystems provide the best disk performance and minimize processor time are not easy to answer. Some studies suggest that XFS and JFS produce the best throughput with small files (e.g., 100MB), while ext2, ext3 are the best with larger files (e.g., 1GB). However, this situation could change with the introduction of the new version of ReiserFS, with its claims of greatly enhanced speed and scalability.

The choice of journaling filesystem can affect disk space availability because of the amount of space needed for the journal. This is a major consideration on small disks, such as Zip disks. For example, on a 100MB Zip disk, ext3fs and XFS each devote 4MB to their journals whereas ReiserFS devotes several times this amount to its journal.

Optimizing Linux Filesystems

System performance can be optimized not only by selecting the most appropriate filesystem(s), but also by utilizing the various options that are available for most filesystems. There are differences in the availability of options according to the particular filesystem, and some options can be set only at filesystem creation time, while others can be changed later.

One generally available option is the ability to set allocation block (the units into which a partition is subdivided) size. Smaller allocation blocks can facilitate more efficient use of disk space, whereas larger blocks can improve performance by reducing both file fragmentation and the time needed to retrieve an entire file. It is not easy to change this option after creation of the filesystem.

All of the journaling filesystems support various journal options. For example, a choice of three data journaling modes in ext3 provides tradeoffs between data integrity and system recovery speed. Also, system performance can often be improved by using the journal location option to place the journal on a different physical disk than the main filesystem.

Another type of option is the number of blocks automatically set aside for use by the root user. For example, for both ext2 and ext3 the default value is five percent. This might be excessive on large partitions or on relatively non-critical partitions, and reducing this value could make a small amount of additional disk space available to non-root users.

In contrast to the filesystems used for Microsoft Windows, fragmentation of files is usually not a major problem with Linux filesystems due to their fundamental differences in design. (Fragmentation refers to parts of files becoming scattered around random, non-contiguous locations on a disk, resulting in reduced speed and reliability.) Thus, whereas the Microsoft Windows operating systems include utilities for defragmentation and encourage their regular use, such utilities are difficult to find for Linux. When Linux users who have come from the Windows world encounter sluggish performance, they are often tempted to attribute it to fragmentation; but it is much more likely the result of running short of memory (and using relatively slow swap disk space instead of RAM) and/or running too many processes (i.e., programs running in the background).

One final word about all of the dozens of filesystems available for Linux. Although having so much to choose from can seem confusing at first, this is just a part of what makes Linux so uniquely flexible and accommodating, i.e., an unparalleled freedom of choice. It is also part of what makes Linux so much fun, at least for those who make the effort to learn about it.




Created April 16, 2004. Copyright © 2004. All Rights Reserved.