by Roderick W. Smith
Everything you do with Linux involves files in one way or another. You
launch programs from files, read program configurations in files, store
data in files, deliver files to clients via servers, and so on.
Therefore, the tools Linux provides for manipulating files are
extremely important to overall system performance. At the core, these
tools make up a filesystem -- a set of data structures that allow Linux
to locate and manipulate files. Several Linux native filesystems exist,
the most important of these being the Second Extended File System
(ext2fs), the Third Extended File System (ext3fs), the Reiser File
System (ReiserFS), the Journaled File System (JFS), and XFS. The
filesystem you use will affect your computer's overall performance and
suitability for specific tasks.
Beyond picking a filesystem, you should be familiar
with various filesystem tools. Filesystem creation options and
performance enhancing tools can improve disk throughput, and partition
resizers enable you to grow or shrink a partition to better suit your
storage needs. Filesystems sometimes become corrupted, and fixing these
problems is critical when they occur. Finally, one very common problem
is that of accidentally or prematurely deleted files. Knowing how to
recover such files can save you or your users a lot of time and effort.
Picking the right filesystem
When you installed Linux, the installation program
gave you options relating to the filesystems you could use. Most
distributions that ship with 2.4.x and later kernels support ext2fs,
ext3fs, and ReiserFS. Some also support JFS and XFS. Even if your
distribution doesn't support JFS or XFS, though, you can add that
support by downloading the appropriate kernel patches or prepatched
kernels from the JFS or XFS
sites and compiling this support as a module or into the kernel proper.
You can then convert a partition from one filesystem to another by
backing up, creating the new filesystem, and restoring. (Support for
JFS has been added to the 2.4.20 and 2.5.6 kernels, and XFS has been
added to the 2.5.36 kernel. Thus, these filesystems are likely to
become options for most distributions at install time.)
Unfortunately, the best filesystem to use is not
always obvious. For many installations, it's not even terribly
important, but for some applications it is. Filesystem design
differences mean that some perform some tasks better than others.
Varying support tools also mean that advanced filesystem features
differ. This section describes the pros and cons of the popular Linux
filesystems in several different areas, such as filesystem portability,
disk check times, disk speed, disk space consumption, support for large
numbers of files, and advanced security features.
Maximizing filesystem portability
Ext2fs is the most portable native Linux filesystem.
Drivers and access tools for ext2fs are available in many different
OSs, meaning that you can access ext2fs data from many non-Linux OSs.
Unfortunately, most of these tools are limited in various ways -- for
instance, they may be access utilities rather than true drivers, they
may not work with the latest versions of ext2fs, they may be able to
read but not write ext2fs, or they may run a risk of causing filesystem
corruption when writing to ext2fs. Therefore, ext2fs's portability is
Ext3fs is a journaling extension to ext2fs. (The
next section, "Reducing Disk Check Times," describes journaling in more
detail.) As such, many of the ext2fs access tools can handle ext3fs,
although some disable write access on ext3 filesystems.
IBM wrote JFS for its AIX OS, and later ported it to
OS/2. IBM then open sourced the OS/2 JFS implementation, leading to the
Linux JFS support. This heritage makes JFS a good choice for systems
that multiboot Linux and OS/2. There are compatibility issues, though.
Most importantly, you must use 4,096-byte clusters to enable both OSs
to use the same JFS partitions. There are also filename case-retention
issues -- OS/2 is case-insensitive, whereas Linux is case-sensitive.
You can use JFS in a case-insensitive way from Linux, but this is only
advisable on dedicated data-transfer partitions.
XFS, from Silicon Graphics' (SGI's) IRIX, is another
migrant filesystem. Linux/IRIX dual-boot systems are rare, but you
might want to use XFS as a compatibility filesystem on removable disks
that move between Linux and IRIX systems. You can also use Linux's XFS
support to read hard disks that originated on IRIX systems.
ReiserFS is currently the least portable of the major Linux-native filesystems. There is a BeOS version,
but versions for other platforms have yet to appear. Therefore, you
should avoid ReiserFS if you need cross-platform compatibility.
Reducing disk check times
All filesystems necessarily write data in chunks. In
the event of a power outage, system crash, or other problem, the disk
may be left in an unstable condition as a result of a half-completed
operation. The result can be lost data and disk errors down the line.
In order to head off such problems, modern filesystems support a dirty
bit. When Linux mounts a filesystem, it sets the dirty bit, and when it
unmounts the filesystem, Linux clears the dirty bit. If Linux detects
that the dirty bit is set when mounting a filesystem, the OS knows that
the filesystem was not properly unmounted and may contain errors.
mount command options, Linux may run
on the filesystem when its dirty bit is set. This program, described in
more detail in the upcoming section, "Recovering from Filesystem
Corruption," checks for disk errors and corrects them whenever possible.
Unfortunately, a complete disk check on a
traditional filesystem such as ext2fs takes a long time, because the
computer must scan all the major disk data structures. If an
inconsistency is found,
fsck must resolve it. The program
can often do this on its own, but it sometimes requires help from a
person, so you may have to answer bewildering questions about how to
fix certain filesystem problems after a crash or other system failure.
Even without answering such questions, disk checks of multigigabyte
hard disks can take many minutes, or potentially even hours. This
characteristic may be unacceptable on systems that should have minimal
down time, such as many servers.
Over the past decade, journaling filesystems have
received increasing attention as a partial solution to the disk check
time problem. A journaling filesystem keeps an on-disk record of
pending operations. When the OS writes data to the disk, it first
records a journal entry describing the operation; then it performs the
operation; and then it clears the journal. In the event of a power
failure or crash, the journal contains a record of all the operations
that might be pending. This information can greatly simplify the
filesystem check operation; instead of checking the entire disk, the
system can check just those areas noted in the journal as having
pending operations. The result is that a journaling filesystem takes
just a few seconds to mount after a system crash. Of course, some data
might still be lost, but at least you won't wait many minutes or hours
to discover this fact.
Linux supports four journaling filesystems:
This filesystem is basically just ext2fs with a
journal added. As such, it's quite reliable, because of the well-tested
nature of the underlying ext2fs. Ext3fs can also be read by an ext2fs
driver; however, when it's mounted in this way, the journal will be
ignored. Ext3fs also has another advantage: As described in the
upcoming section, "Converting Ext2fs to Ext3fs," you can convert an
existing ext2 filesystem into an ext3 filesystem without backing up,
repartitioning, and restoring.
This filesystem was the first journaling filesystem
added to the Linux kernel. As such, it's seen a lot of testing and is
very reliable. It was designed from the ground up as a journaling
filesystem for Linux, and it includes several unusual design features,
such as the ability to pack small files into less disk space than is
possible with many filesystems.
IBM's JFS was developed in the mid-1990s for AIX,
then it found its way to OS/2 and then to Linux. It's therefore well
tested, although the Linux version hasn't seen much use compared to the
non-Linux version or even ext3fs or ReiserFS on Linux.
SGI's XFS dates from the mid-1990s on the IRIX
platform, so the filesystem fundamentals are well tested. It's the most
recent official addition to the Linux kernel, although it has been a
fairly popular add-on for quite a while. XFS comes with more ancillary
utilities than does any filesystem except ext2fs and ext3fs. It also
comes with native support for some advanced features, such as ACLs (see
the upcoming section, "Securing a Filesystem with ACLs), that aren't as
well supported on most other filesystems.
For the most part, I recommend using a journaling
filesystem; the reduced startup time makes these filesystems beneficial
after power outages or other problems. Some of these filesystems do
have drawbacks, though. Most importantly, some programs rely upon
filesystem quirks in order to work. For instance, as late as 2001,
programs such as NFS servers and the Win4Lin emulator had problems with
some of these journaling filesystems. These problems have been
disappearing, though, and they're quite rare as of the 2.5.54 kernel.
Nonetheless, you should thoroughly test all your programs (especially
those that interact with disk files in low-level or other unusual ways)
before switching to a journaling filesystem. The safest journaling
filesystem from this perspective is likely to be ext3fs, because of its
close relationship to ext2fs.
ReiserFS and JFS are also somewhat deficient in terms of support programs. For instance, neither includes a
dump backup utility. XFS's
xfsdump) is available from the XFS development site but isn't shipped with the
xfsprogs 2.2.1 package, although some distributions ship it in a separate
xfsdump package. The
xfsdump and the ext2fs/ext3fs
dump programs create incompatible archives, so you can't use these tools to back up one filesystem and restore it to another.
Maximizing disk throughput
One question on many people's minds is which
filesystem yields the best disk performance. Unfortunately, this
question is difficult to answer because different access patterns, as
created by different uses of a system, favor different filesystem
designs. In Linux Filesystems (Sams, 2001), William von Hagen ran many
benchmarks and found that every Linux filesystem won several individual
tests. As a general rule, though, XFS and JFS produced the best
throughput with small files (100MB), while ext2fs, ext3fs, and to a
lesser extent JFS did the best with larger files (1GB). Some benchmarks
measure CPU use, which can affect system responsiveness during
disk-intensive operations. At small file sizes, results were quite
variable; no filesystem emerged as a clear winner. At larger file
sizes, ext3fs and JFS emerged as CPU-time winners.
Unfortunately, benchmarks are somewhat artificial
and may not reflect real-world performance. For instance, von Hagen's
benchmarks show ext2fs winning file-deletion tests and ReiserFS coming
in last; however, von Hagen comments that this result runs counter to
his subjective experience, and I concur. ReiserFS seems quite speedy
compared to ext2fs when deleting large numbers of files. This disparity
may be because von Hagen's tests measured CPU time, whereas we humans
are more interested in a program's response time. The moral is that you
shouldn't blindly trust a benchmark. If getting the best disk
performance is important to you, try experimenting yourself. Be sure to
run tests using the same hardware and partition; wipe out each
filesystem in favor of the next one, so that you're testing using the
same disk and partition each time. Install applications or user files,
as appropriate, and see how fast the system is for your specific
purposes. If this procedure sounds like it's too much effort to
perform, then perhaps the performance differences between filesystems
aren't all that important to you, and you should choose a filesystem
based on other criteria.
Minimizing space consumption
Most filesystems allocate space to files in blocks, which are typically power-of-two multiples of 512 bytes in size (that is, 21 x 512, 22 x 512, 23
x 512, and so on). Common block sizes for Linux filesystems range from
1KB to 4KB (the range for ext2fs and ext3fs). XFS supports block sizes
ranging from 512 bytes to 64KB, although in practice block size is
limited by CPU architecture (4KB for IA-32 and PowerPC; 8KB for Alpha
and Sparc). ReiserFS and Linux's JFS currently support only 4KB blocks,
although JFS's data structures support blocks as small as 512 bytes.
The default block size is 4KB for all of these filesystems except
ext2fs and ext3fs, for which the default is based on the filesystem
You can minimize the space used by files, and hence
maximize the number of files you can fit on a filesystem, by using
smaller block sizes. This practice may slightly degrade performance,
though, as files may become more fragmented and require more pointers
to completely describe the file's location on the disk.
ReiserFS is unusual in that it supports storing file
tails -- the ends of files that don't occupy all of an allocation block
-- from multiple files together in one block. This feature can greatly
enhance ReiserFS's capacity to store many small files, such as those
found on a news server's spool directory. XFS uses a different approach
to achieve a similar benefit -- it stores small files entirely within
the inode (a disk structure that points to the file on disk, holds the
file's time stamp, and so on) whenever possible.
None of these features has much impact when average
file sizes are large. For instance, saving 2KB by storing file tails in
a single allocation block won't be important if a filesystem has just
two 1GB files. If the filesystem has 2,000,000 1KB files, though, such
space-saving features can make a difference between fitting all the
files on a disk or having to buy a new disk.
Another aspect of disk space consumption is the
space devoted to the journal. On most disks, this isn't a major
consideration; however, it is a concern on small disks, such as Zip
disks. On a 100MB Zip disk, ReiserFS devotes 32MB to its journal and
ext3fs and XFS both devote 4MB. JFS devotes less space to its journal
initially, but it may grow with use.
Ext2fs and ext3fs suffer from another problem: By
default, they reserve five percent of their disk space for emergency
use by root. The idea is to give root space to work in case a
filesystem fills up. This may be a reasonable plan for critical
filesystems such as the root filesystem and /var, but for some it's
pointless; for instance, root doesn't need space on /home or on
removable media. The upcoming section, "Creating a Filesystem for
Optimal Performance," describes how to reduce the reserved space
Supporting the maximum number of files
To some extent, storing the maximum number of files
on a partition is an issue of the efficient allocation of space for
small files, as described in the preceding section, "Minimizing Space
Consumption." Another factor, though, is the number of available
inodes. Most filesystems support a limited number of inodes per disk.
These inodes limit the number of files a disk can hold; each file
requires its own inode, so if you store too many small files on a disk,
you'll run out of inodes. With ext2fs and ext3fs, you can change the
number of inodes using the
-N options to
when you create the filesystem. These options set the bytes-per-inode
ratio (typically 2 or 4; increasing values decrease the number of
inodes on the filesystem) and the absolute number of inodes,
respectively. With XFS, you can specify the maximum percentage of disk
space that may be allocated to inodes with the
maxpct option to
mkfs.xfs. The default value is 25, but if you expect the filesystem to have very many small files, you can specify a larger percentage.
ReiserFS is unusual in that it allocates inodes
dynamically, so you don't need to be concerned with running out of
inodes. This fact also means that the
-i option to the
utility, which normally returns statistics on used and available
inodes, returns meaningless information about available inodes on
Securing a filesystem with ACLs
Linux, like Unix in general, has traditionally used
file ownership and permissions to control access to files and
directories. Some of the tools for handling these features are
described in Chapter 5, "Doing Real Work in Text Mode." Another way to
control access to files is by using access control lists (ACLs). ACLs
provide finer-grained access control than do ownership and permissions.
ACLs work by attaching additional information -- a list of users or
groups and the permissions to be granted to each -- to the file. For
instance, suppose you have a file that contains confidential data. This
data must be readable and writeable by you and readable by a particular
group (say, readers). You give the file ownership and permissions such
that only you can read or write the file and that anybody in readers
can read it (0640, or -rw-r-----). You need to share this file with
just one other user, though, and for purposes of security for other
files, this user should not be a member of the readers group. ACLs
enable you to do this by giving read permission to this one user,
independently of the readers group. Without ACLs, you would need to
create a new group (say, readers2) that contains all of the members of
readers plus the one extra user. You'd then need to maintain this extra
group. Also, ordinary users can manipulate ACLs, but this isn't usually
the case for groups, so ACLs can greatly simplify matters if users
should be able to give each other access to specific files while still
maintaining restricted access to those files for others.
Few Linux-native filesystems support ACLs directly; this honor belongs only to XFS. If you need ACLs, though, you can obtain add-on packages
for ext2fs, ext3fs, and JFS. No matter what filesystem you use, you'll
also need support utilities, which are available from the same site.
These tools enable you to define and modify ACLs. For instance,
getfacl displays a file's ACLs, and
setfacl changes a file's ACLs.
ACLs are still quite new in Linux. As such, you may
run into peculiar problems with specific programs or filesystems.
Chances are you don't need ACLs on a typical workstation or a small
server. If you're administering a multiuser system with a complex group
structure, though, you might want to investigate ACLs further. You
might be able to simplify your overall permissions structure by switching to a filesystem that supports ACLs.