mhddfs: join several filesystems together to form a single larger one

2020 update

MHDDFS appears to have some long-stanging reliability issues (Segmentation faults while traversing directories). For that reason I currently use MergerFS instead. The base concept is exactly the same, except it works without those faults, and has a somewhat different feature set and configuration syntax.

The original article

Suppose, you have three hard drives - sized 80, 40 and 60 GB. And 150 GB of music files, which you need to store on these drives. How would you do it?

The two solutions I knew of, were:

  • either to simply have three separate “Music” folders - one per each drive;
  • or create some sort of RAID, joining all the drives into an array.

However, the first method is quite tiresome, as one needs to decide how to split the data between the drives and keep track of what is stored where. For example, I might decide to store all “Classical” music on the first disk, and “Rock” music on the second. Then, suddenly, the first drive fills up and the second one still has plenty of space. Now I need to move the files between the disks, or jump around with symlinks.

The RAID method, while solving this problem, always incurs significant loss of either storage reliability or usable disk space.

But recently, I found a better solution to this problem and similar ones: mhddfs. It is a FUSE filesystem module which allows to combine several smaller filesystems into one big “virtual” one, which will contain all the files from all its members, and all their free space. Even better, unlike other similar modules (unionfs?), this one does not limit the ablility to add new files on the combined filesystem and intellegently manages, where those files will be placed.

The package is called «mhddfs» and is readily available in Debian and probably in many other distros as well.

Let's say the three hard drives you have are mounted at /mnt/hdd1 /mnt/hdd2 and /mnt/hdd3. Then, you might have something akin to the following:

$ df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1              80G   50G   30G  63% /mnt/hdd1
/dev/sdb1              40G   35G    5G  88% /mnt/hdd2
/dev/sdc1              60G   10G   50G  17% /mnt/hdd3

After you have installed the mhddfs package using your favourite package manager, you can create a new mount point, let's call it /mnt/virtual, which will join all these drives together for you. The beauty of FUSE means you don't really have to be root for this (can be just a member of the fuse group), but for the sake of examples' simplicity, let's suppose we are logged in as root here.

# mkdir /mnt/virtual
# mhddfs /mnt/hdd1,/mnt/hdd2,/mnt/hdd3 /mnt/virtual -o allow_other
option: allow_other (1)
mhddfs: directory '/mnt/hdd1' added to list
mhddfs: directory '/mnt/hdd2' added to list
mhddfs: directory '/mnt/hdd3' added to list
mhddfs: move size limit 4294967296 bytes
mhddfs: mount point '/mnt/virtual'

The «-o allow_other» option here means that the resulting filesystem should be visible to all users, not just to the one who created it.

The result will look like this:

$ df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1              80G   50G   30G  63% /mnt/hdd1
/dev/sdb1              40G   35G    5G  88% /mnt/hdd2
/dev/sdc1              60G   10G   50G  17% /mnt/hdd3
mhddfs                180G   95G   85G  53% /mnt/virtual

As you can see, the new filesystem has been created. It joined the total size of all drives together (180G), added together the space used by all files there (95G) and summed up the free space (85G). If you look at files in /mnt/virtual, you'll notice that it has files from all three drives, with all three directory structures “overlayed” onto each other.

But what if you try to add new files somewhere inside that /mnt/virtual? Well, that is quite tricky issue, and I must say the author of mhddfs solved it very well. When you create a new file in the virtual filesystem, mhddfs will look at the free space, which remains on each of the drives. If the first drive has enough free space, the file will be created on that first drive. Otherwise, if that drive is low on space (has less than specified by “mlimit” option of mhddfs, which defaults to 4 GB), the second drive will be used instead. If that drive is low on space too, the third drive will be used. If each drive individually has less than mlimit free space, the drive with the most free space will be chosen for new files.

It's even more than that; if a certain drive runs out of free space in the middle of a write (suppose, you tried to create a very large file on it), the write process will not fail; mhddfs will simply transfer the already written data to another drive (which has more space available) and continue the write there. All this completely transparently for to the application which writes the file (it will not even know that anything happened).

Now you can simply work with files in /mnt/virtual, not caring about what is being read from which disk, etc. Also, the convenience of having large “contiguous” free space means you can simply drop any new files into that folder and (as long as there's space on at least one member of the virtual FS) not care about which file gets stored where.

If you decide to make that mount point creating automatically for you on each boot, you can add the following line to /etc/fstab:

mhddfs#/mnt/hdd1,/mnt/hdd2,/mnt/hdd3 /mnt/virtual fuse defaults,allow_other 0 0

For more details, see man mhddfs.

The last, but not the least important thing to mention, is the fact that it's very simple to stop using mhddfs, if you later decide to do so - and not lose any file data or directory structure. Let's say, at some point in time, you purchase a new 500 GB hard disk, and want to sell all the smaller disks on eBay. You can just plug in the new drive, copy everything from /mnt/virtual onto it, then remove mhddfs mountpoint and disconnect the old drives (don't forget about wiping those before selling, though!). All your folders, which were previously merged in a “virtual” way by mhddfs, will now be merged “in reality”, on the new disk. And thanks to the fact that files themselves are not split into bits which are stored on different drives, even in the unlikely event when mhddfs suddenly no longer works for you (or disappears from existence), you can still copy all your data from all three drives into one single folder, and have the same structure you previously had in that /mnt/virtual mountpoint.

More about mhddfs

mhddfs.txt · Last modified: 2020-05-21 21:09 UTC by rm