BLOCKS A partition, disk, file, or block device formatted with a Second Extended Filesystem is divided into small groups of sectors called "blocks". These blocks are then grouped into larger units called block groups. The size of the blocks are usually determined when formatting the disk and will have an impact on performance, maximum file size, and maximum file system size. Block sizes commonly implemented include 1KiB, 2KiB, 4KiB, and 8KiB although provisions in the superblock allow for block sizes as big as 1024*(2^31)-1. Depending on the implementation, some architectures may impose limits on which block sizes are supported. For example, a Linux 2.6 implementation on DEC Alpha uses blocks of 8KiB but the same implementation on a Intel 386 processor will support a maximum block size of 4KiB. BLOCK GROUPS Blocks are clustered into block groups in order to reduce fragmentation and minimise the amount of head seeking when reading a large amount of consecutive data. Information about each block group is kept in a descriptor table stored in the block(s) immediately after the superblock. Two blocks near the start of each group are reserved for the block usage bitmap and the inode usage bitmap which show which blocks and inodes are in use. Since each bitmap is limited to a single block, this means that the maximum size of a block group in blocks is 8 times the size of a block. The block(s) following the bitmaps in each block group are designated as the inode table for that block group and the remainder are the data blocks. The block allocation algorithm attempts to allocate data blocks in the same block group as the inode which contains them. A directory is a filesystem object and has an inode just like a file. It is a specially formatted file containing records which associate each name with an inode number. Later revisions of the filesystem also encode the type of the object (file, directory, symlink, device, fifo, coket) to avoid the need to check the inode itself for this information. The inode allocation code should try to assign inodes which are in the same block group as the directory in which they are first created. The original Ext2 revision used singly-linked lists to store the filenames in the directory; newer revisions are able to use hashes and binary trees. Also note that as directory grows additional blocks are assigned to store the additional file records. When filenames are removed, some implementations do not free these additional blocks. INODES The inode (index node) is a fundamental concept in the ext2 filesystem. Each object in the filesystem is represented by an inode. The inode structure contains pointers to the filesystem blocks which contain the data held in the object and all of the metadata about an object includes the permissions, owner, group, flags, size, number of blocks used, access time, change time, modification time, deletion time, number of links, fragments, version (for NFS) and extended attributes (EAs) and/or Access Control Lists (ACLs). There are some reserved fields which are currently unused in the inode structure and several which are overloaded. One field is reserved for the directory ACL if the inode is a directory and alternately for the top 32 bits of the file size if the inode is a regular file (allowing file sizes larger than 2GB). The translator field is unused under Linux, but is used by the HURD to reference the inode of a program which will be used to interpret this object. Most of the remaining reserved fields have been used up for both Linux and the HURD for larger owner and group fields. The HURD also has a larger mode field so it uses another of the remaining fields to store the extra bits. There are pointers to the first 12 blocks which contain the file's data in the inode. There is a pointer to an indirect block (which contains pointers to the next set of blocks), a pointer to a doubly-indirect block (which contains pointers to indirect blocks) and a pointer to a trebly-indirect block (which contains pointers to doubly-indirect blocks). Some filesystem specific behavior flags are also stored and allow for specific filesystem behavior on a per-file basis. There are flags for secure deletion, undeletable, compression, synchronous updates, immutability, append-only, dumpable, no-atime, indexed directories, and data-journaling. Many of the filesystem specific behavior flags, like journaling, have been implemented in newer filesystems like ext3 and ext4, while some others are still under development. All the inodes are stored in inode tables, with one inode table per block group. SUPERBLOCKS The superblock contains all the infomation about the configuration of the filesystem. The information in the superblock contains fields such as the total number of inodes and blocks in the filesystem and how many are free, how many inodes and blocks are in each block group, when the filesystem was mounted (and if it was cleanly unmounted), when it was modified, what version of the filesystem it is and which OS created it. The primary copy of the superblock is stored at an offset of 1024 bytes from the start of the device, and it is essential to mounting the filesystem. Since it is so important, backup copies of the superblock are stored in block groups throughout the filesystem. The first version of ext2 (revision 0) stores a copy at the start of every block group, along with backups of the group descriptor block(s). Because this can consume a considerable amount of space for large filesystems, later revisions can optionally reduce the number of backup copies by only putting backups in specific groups (this is the sparse superblock feature). The groups chosen are 0, 1 and powers of 3, 5 and 7. Revision 1 and higher of the filesystem also stores extra fields, such as volume name, a unique identification number, the inode size, and space for optional filesystem features to store configuration info. All fields in the superblock (as in all other ext2 structures) are stored on the disk in little endian format, so a filesystem is portable between machines without having to know what machine it was created on. SYMBOLIC LINKS A symbolic link (also symlink or soft link) is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Symbolic links operate transparently for most operations: programs which read or write to files named by a symbolic link will behave as if operating directly on the target files. However, programs that need to handle symbolic links specially (e.g., backup utilities) may identify and manipulate them directly. A symbolic link merely contains a text string that is interpreted and followed by the operating system as a path to another file or directory. It is a file on its own and can exist independantly of its target. The symbolic links do not affect and inode link count. If a symbolic link is deleted, its target remains unaffected. If the target is moved, renamed, or deleted, any symbolic link that used to point to it continues to exist but now points to a nonexisting file. Symbolic links pointing to non-existing files are sometimes called "orphaned" or "dangling". Symbolic links are also filesystem objects with inodes. For all symlink shorter than 60 bytes long, the data is stored within the inode itself; it uses the fields which would normally be used to store the pointers to data blocks. This is a worthwhile optimisation as it avoids allocating a full block for the symlink, and most symlinks are less than 60 characters long. Symbolic links can also point to files or directories of other partitions and file systems. DISK ORGANIZATION An ext2 filesystem starts with a superblock located at byte offset 1024 from the start of the volume. This is block 1 for a 1KiB block formatted volume, or within block 0 for larger block sizes. Note that the size of the superblock is constant regardless of the block size. On the next block(s) following the superblock, is the Block Group Descriptor Table; which provides an overview of how the volume is split into block groups and where to find the inode bitmap, the block bitmap, and the inode table for each block group. In revision 0 of ext2, each block group consists of a copy superblock, a copy of the block group descriptor table, a block bitmap, an inode bitmap, an inode table, and data blocks. With the introduction of revision 1 and the sparse superblock feature in ext2, only specific block groups contain copies of the superblock and block group descriptor table. All block groups still contain the block bitmap, inode bitmap, inode table, and data blocks. The shadow copies of the superblock can be located in block groups 0, 1 and powers of 3, 5, and 7. The block bitmap and inode bitmap are limited to 1 block each per block group, so the total blocks per block group is therefore limited. (More information in the Block Size Impact table). Each data block may also be further divided into 'fragments'. As of Linux 2.6.28, support for fragment was still not implemented in the kernel; it is therefore suggested to ensure the fragment size is equal to the block size so as to maintain compatibility. Example layout Block Offset Length Description Byte 0 512B Boot record (if present) Byte 512 512B Additional boot record data (if present) byte 1024 1024B Superblock Block 2 1 Block Block group descriptor table Block 3 1 Block Block bitmap Block 4 1 Block inode bitmap Block 5 23Block inode table Block 28 1412 Bl data blocks For the curious, block 0 always points to the first sector of the disk or partition and will always contain the boot record if one is present. The superblock is always located at byte offset 1024 from the start of the disk or partition. In a 1KiB block-size formatted file system, this is block 1, but will always be block 0 (at 1024 bytes within block 0) in larger block size file systems. The layout on disk is very predictable as long as you know a few basic information; block size, blocks per group, inodes per group. This information is all located in, or can be computed from, the superblock structure. Nevertheless, unless the image was crafted with controlled parameters, the position of the various structures on disk (except the superblock) should never be assumed. Always load the superblock first. Notice how block 0 is not part of the block group 0 in 1KiB block size file systems. The reason for this is block group 0 always starts with the block containing the superblock. Hence on 1KiB block systems, block group 0 starts at block 1, but on larger block sizes it starts on block 0. For more information, see the s_first_data_block superblock entry. SUPERBLOCK The superblock is always located at byte offset 1024 from the beginning of the file, block device, or partition formatted with ext2 and later variants (ext3, ext4). Its structure is mostly constant from ext2 to ext3 and ext4 with only some minor changes. Offset Bytes Description ------------------------------------------------------------------------------------------------------------------------------------- 0x0000 4 s_inodes_count 0x0004 4 s_blocks_count 0x0008 4 s_r_blocks_count 0x000C 4 s_free_blocks_count 0x0010 4 s_free_inodes_count 0x0014 4 s_first_data_block 0x0018 4 s_log_block_size 0x001C 4 s_log_frag_size 0x0020 4 s_blocks_per_group 0x0024 4 s_frags_per_group 0x0028 4 s_inodes_per_group 0x002C 4 s_mtime 0x0030 4 s_wtime 0x0034 2 s_mnt_count 0x0036 2 s_max_mnt_count 0x0038 2 s_magic 0x003A 2 s_state 0x003C 2 s_errors 0x003E 2 s_minor_rev_level 0x0040 4 s_lastcheck 0x0044 4 s_checkinterval 0x0048 4 s_creator_os 0x004C 4 s_rev_level 0x0050 2 s_def_resuid 0x0052 2 s_def_resgid 0x0054 4 s_first_ino 0x0058 2 s_inode_size 0x005A 2 s_block_group_nr 0x005C 4 s_feature_compat 0x0060 4 s_feature_incompat 0x0064 4 s_feature_ro_compat 0x0068 16 s_uuid 0x0078 16 s_volume_name 0x0088 64 s_last_mounted 0x00C8 4 s_algo_bitmap 0x00CC 1 s_prealloc_blocks 0x00CD 1 s_prealloc_dir_blocks 0x00CE 2 (alignment) 0x00D0 16 s_journal_uuid 0x00E0 4 s_journal_inum 0x00E4 4 s_journal_dev 0x00E8 4 s_last_orphan 0x00EC 4x4(16) s_hash_seed 0x00FC 1 s_def_hash_version 0x00FD 3 padding - reserved for future expansion 0x0100 4 s_default_mount_options 0x0104 4 s_first_meta_bg 0x0108 760 Unused - reserved for future expansion s_inodes_count 32bit value indicating the total number of inodes, both used and free, in the file system. This value must be lower or equal to (s_inodes_per_group * number of block groups). It must be equal to the sum of the inodes defined in each block group. s_blocks_count 32bit value indicating the total number of blocks in the system including all used, free, and reserved. This value must be lower or equal to (s_blocks_per_group * number of block groups). It must be equal to the sum of the blocks defined in each block group. s_r_blocks_count 32bit value indicating the total number of blocks reserved for the usage of the super user. This is most useful if for some reason a user, maliciously or not, fill the system to capacity; the super user will have this specified amount of free blocks at his disposal so he can edit and save configuration files. s_free_blocks_count 32bit value indicating the total number of free blocks, including the number of reserved blocks (see s_r_blocks_count). This is a sum of all free blocks of all the block groups. s_free_inodes_count 32bit value indicating the total number of free inodes. This is a sum of all free inodes of all the block groups. s_first_data_block 32bit value identifying the first data block, in other words the id of the block containing the superblock structure. NOTE: 1 for block size of 1KiB, otherwise 0. s_log_block_size The block size is computated using this 32bit value as the number of bits to shift left the value of 1024. This value may only be positive. s_log_frag_size The fragment size is computed using this 32bit value as the number of bits to shift left the value 1024. Negative values will shift the bit right rather than left. s_blocks_per_group 32bit value indicating the total number of blocks per group. This value in combination with s_first_data_block can be used to determine the block groups boundaries. s_frags_per_group 32bit value indicating the total number of fragments per group. It is also used to determine the size of the block bitmap of each block group. s_inodes_per_group 32bit value indicating the total number of inodes per group. This is also used to determine the size of the inode bitmap of each block group. Note that you cannot have more than (block size in bytes * 8) inodes per group as the inode bitmap must fit within a single block. This value must be a perfect multiple of inodes that can fit in a block ((1024< + 3980B Padding