Integrating LVM with hadoop to provide dynamic storage to datanode

5 min readNov 15, 2020

Many times in companies we have static partitions mounted with hadoop datanode. since data keeps on coming, there a time come when storage is full and we have to stop that node which is a wastage of resources and money and if we give lots of storage initially, then since most of the storage is unused its again a wastage of resources. so, dynamic partitions/storages are always preferred in hadoop as its size can be increased anytime on the fly even without unmounting.

so, here we are going to create dynamic storage for hadoop using LVM and will also see how the size can be increased or decreased on the fly. so, let's start

using lsblk command you can see i have one root hard disk i.e. sda and 3 more hard disks i have attached to my redhat vm i.e. sdb, sdc, sdd. Now i am going to make LVM using sdb and sdc and then will show you adding sdd disk dynamically.

LVM is a pure linux concept, if you have some knowledge of partitions it will be easy to understand. A fascinating usecase of lvm is, since we have two disks of 5gb and 10gb so we can’t store a 12gb file/video or anything, but using LVM we will mix these two disks and then we can store a 15gb file in them. let’s see now how to do it.

First we always have to convert our disks (that we want to use in LVM) into physical volumes. The command for this is 👇

Now, these physical volumes we have to group together and have to give a name to these grouped harddisks. This is called volume group. I am giving name vg1

if you use vgdisplay command, there is no such volume group, but you can then create by the following command and after that you can see it is created.

Now finally you can create logical volume or LVM and we can take space from this LVM for our datanode(lvm is like a storage disk you can assume). To create it, just one command👇

Here, we have taken a size of 12gb from our volume groups and given it a name → mylv1 and in last we mentioned name of our volume group from which we have to take space i.e. vg1

Now, we can assume it a normal disk and hence to store data in it, we have to first format it and mount to our datanode folder which we want to share. This is done below👇

You can also confirm as below

Now we can use this as a datanode.

Now let’s see how we can increase the size of datanode.

There are two methods of increasing:-

we can take space from the existing volume group as follows👇

Now since we have increses our partition size, so this increased part also needs to be formatted. if we use mkfs cmd, it would format whole disk and we would loose our data. so, to format the required space we can do as below👇

see df -h is still showing 12G as it only shows formatted space. you can see the new by fdisk command.

Now lets format

Now see df -h is showing the updated one as it is now been formatted. Also there is no harm to data.

2. You can add more disks to this volume group and then can use lvextend command. i am showing you with sdd drive. you can also use disk from cloud also to increase space.

First we need to convert sdd to physical volume and then we can add it to volume groups