Introduction To Inodes

Today we had an interesting problem for a project. We were getting the message ‘disk is full’ despite having plenty of free space. Luckily my first thought was ‘inodes?’

 I logged in and checked inode usage

$ df -i
Filesystem   Inodes    IUsed    IFree   IUse% Mounted on
/dev/xvda1   525312   524844   468   100%   /
tmpfs           1007942   1102  1006840    1% /run

This shows that all the inodes on the disk itself are full.

 High inode usage is usually caused by a massive number of small files. In this case, the session files are normally stored somewhere temporary and removed when not in use. Either there could have been a bug in the code not removing them or it was a higher traffic website.

So we can see that inode is a very important aspect of Linux, so this blog will take u to a detailed journey to inodes.

What is in an inode?

An Inode number points to an Inode. An Inode is a data structure that stores the following information about a file :

  • Size of file        
  • Device ID    
  • User ID of the file        
  • Group ID of the file        
  • The file mode information and access privileges for owner, group, and others        
  • File protection flags        
  • The timestamps for file creation, modification, etc        
  • link counter to determine the     number of hard links        
  • Pointers to the blocks storing file’s contents    

Please note that the name of the file is not stored in Inodes.

When a file is created inside a directory then the file-name and Inode number are assigned to file. These two entries are associated with every file in a directory. The user might think that the directory contains the complete file and all the extra information related to it but this is not the case. The directory is just a table that contains the filenames in the directory, and the matching inode. Think of it as a table, and the first two entries are always “.” and “..” The first points to the inode of the current directory, and the second points to the inode of the parent directory.

When a user tries to access the file or any information related to the file then he/she uses the file name to do so but internally the file-name is first mapped with its Inode number stored in a table. Then through that Inode number the corresponding Inode is accessed. There is a table (Inode table) where this mapping of Inode numbers with the respective Inodes is provided.

Why no file-name in Inode information?

As pointed out earlier, there is no entry for the file name in the Inode. The reason for separating out file name from the other information related to the same file is for maintaining hard-links to files. This means that once all the other information is separated out from the file name then we can have various file names that point to the same Inode.

For example :

$ touch file1
$ ln file1 file2
$ ls -li
total 0
52299066 -rw-r–r– 2 root root 0 Feb 19 15:36 file1
52299066 -rw-r–r– 2 root root 0 Feb 19 15:36 file2

In the above output, we created a file ‘file1’ and then created a hard link ‘file2’. Now when the command ‘ls -li’ is run, we can see the details of both ‘file1’ and ‘file2’. We see that both the files are having the same Inode number. Note that Hard links cannot be created on different file systems and also they cannot be created for directories.

When are Inodes created?

As we all now know that Inode is a data structure that contains information of a file. Since data structures occupy storage then an obvious question arises about when the Inodes are created in a system? Well, space for Inodes is allocated when the operating system or a new file system is installed and when it does its initial structuring. So this way we can see that in a file system, the maximum number of Inodes and hence the maximum number of files are set.

$ df -hi
Filesystem Inodes IUsed IFree IUse% Mounted on
udev 977K 516 977K 1% /dev
tmpfs 985K 1011 984K 1% /run
/dev/sda2 59M 272K 58M 1% /
tmpfs 985K 116 985K 1% /dev/shm
tmpfs 985K 5 985K 1% /run/lock
tmpfs 985K 18 985K 1% /sys/fs/cgroup
/dev/loop0 27K 27K 0 100% /snap/gtk-common-themes/319
/dev/loop1 1.3K 1.3K 0 100% /snap/gnome-calculator/180

Using the command ‘df -hi’ we can see the filesystems and their maximum number of Inodes and current Inode usage.

Now, the above concept brings up another interesting fact. A file system can run out of space in two ways :

  • No space for adding new data is left        
  • All the Inodes are consumed.    

Well, the first way is pretty obvious but we need to look at the second way. Yes, it’s possible that a case arises where we have free storage space but still we cannot add any new data in file system because all the Inodes are consumed. This may happen in a case where the file system contains a very large number of very small-sized files. This will consume all the Inodes and though there would be free space from a Hard-disk-drive point of view but from the file system point of view no Inode available to store any new file.

The above use-case is possible but less encountered because on a typical system the average file size is more than 2KB which makes it more prone to running out of hard disk space first. But, nevertheless there exists an algorithm which is used to create a number of Inodes in a file system. This algorithm takes into consideration the size of the file system and the average file size.

Inode Structure of a File

Now let’s see how the structure of an inode of a file look like.

Mode:This keeps the information about two things, one is the permission information, the other is the type of inode, for example an inode can be of a file, directory or a block device, etc.

Owner Info: Access details like the owner of the file, group of the file, etc.

Size: This location store the size of the file in terms of bytes.

Time Stamps: it stores the inode creation time, modification time, etc. Now comes the important thing to understand about how a file is saved in a partition with the help of an inode.

Block Size: Whenever a partition is formatted with a file system. It normally gets formatted with a default block size. Now block size is the size of chunks in which data will be spread. So if the block size is 4K, then for a file of 15K it will take 4 blocks(because 4K*4 16).

Direct Block Pointers:

In an ext2 file system, an inode consists of only 15 block pointers. The first 12 block pointers are Direct Block pointers. This means that these pointers point to the address of the blocks containing the data of the file. 12 Block pointers can point to 12 data blocks. So in total, the Direct Block pointers can address only 48K of data. This means if the file is only of 48K or below in size, then the inode itself can address all the blocks

containing the data of the file.

Indirect Block Pointers:

whenever the size of the data goes above 48k the 13th pointer in the inode will point to the very next block after the data, which will point to the next block address where data is to be copied.

Now as we have taken our block size as 4K, the indirect block pointer, can point to 1024 blocks containing data(by taking the size of a block pointer as 4bytes, one 4K block can point to 1024 blocks because 4 bytes * 1024 = 4K).

which means an indirect block pointer can address, up to 4MB of data(4bytes of block pointer in 4K block, can point and address 1024 number of 4K blocks which makes the data size of 4M)

Double indirect Block Pointers:

Now if the size of the file is above 4MB + 48K then the inode will start using Double Indirect Block Pointers, to address data blocks. Double Indirect Block pointer in an inode will point to the block that comes just after 4M + 48K data, which intern will point to the blocks where the data is stored.

Double Indirect block pointer also is inside a 4K block as every block are 4K, Now block pointers are 4 bytes in size, as mentioned previously, so Double indirect block pointer can address 1024 Indirect Block pointers. So with the help of a double indirect Block Pointer the size of the data can go up to 4G.

Triple Indirect Block Pointers:

Now this triple Indirect Block Pointers can address up to 4G * 1024 = 4TB, of file size. The fifteenth block pointer in the inode will point to the block just after the 4G of data, which intern will point to 1024 Double Indirect Block Pointers.

So after the 12 direct block pointers, the 13th block pointer in the inode is for Indirect block pointers, and the 14th block pointer is for double indirect block pointers, and the 15th block pointer is for triple indirect block pointers.

Conclusion

Every Linux file has an inode, and the inode contains all properties of the file, but not the file name. There is no difference between the original file and the hard link; they both refer to the same.

Blog Pundits: Sandeep Rawat

OpsTree is an End-to-End DevOps Solution Provider.

Connect with Us

Leave a Reply