Integration of LVM with Hadoop-Cluster for making shared storage elastic using AWS Cloud

Adarsha Dinda

6 min readApr 6, 2021

Let’s understand few concepts related to our task .

HADOOP -

BigData is a Problem and Hadoop is the solution to this.
Hadoop is an open-source framework written in Java Language.
Hadoop handles massive data.
It solves the issue of Volume and Velocity.
Hadoop uses a cluster to handle humongous amounts of data.

LVM ( LOGICAL VOLUME MANAGEMENT)

LVM is an advanced version of Partition.
The partition is of two types — Static and Dynamic. So LVM is a Dynamic one.
LVM can increase or decrease the size of the allocated drive on runtime.
LVM uses Volume Groups which borrow space from Physical Volumes.

ELASTICITY-

👩🏻‍🏫Do you remember what our college physics had taught us about elasticity. Let me define it .

The property of a substance that enables it to change its length, volume, or shape in direct response to a force effecting such a change and to recover its original form upon the removal of the force is called Elasticity.

Elasticity means the Volume can be increased or decreased when it is needed.
This concept is used in companies as real-world applications need a dynamic approach.

Elasticity is the concept which we can use to increase or decrease the volume of Hadoop datanode on the fly. In the real world, Hadoop data nodes shared storage can’t be static so LVM is used to make it dynamic.

Task Description :

🌀7.1: Elasticity Task

🔅Integrating LVM with Hadoop and providing Elasticity to Data Node Storage

🔅Increase or Decrease the Size of Static Partition in Linux.

Now let’s jump to practical part👇🏻

🌀Step 1 : Integrating LVM with Hadoop and providing Elasticity to Data Node Storage.

🔅Firstly we need to configure Data Node on AWS portal. Before that I am going to create and attach two more hard disks for LVM. one is LVM1 of size 2 GB and another is LVM2 of size 1 GB .

‘ create ‘part is done. Let’s attach it with data node .

Attachment is complete …………………… . Let’s move to next step .

Now let’s move to our main task of LVM

Here we can check total number of hard disks we have attached by using command: #fdisk -l

Now we need to convert our Physical H.D to Physical Volume(PV). Because VG(Volume Group) only understands in PV format.

To create a PV we use command : #pvcreate /dev/xvdf & To Confirm and display our PV we use command: #pvdisplay /dev/xvdf

Now let’s combine these both PV’s and form one VG of 3GiB using this command : #vgcreate arth /dev/xvdg /dev/xvdf

Here we got a new storage or H.D of nearly 3GiB because some of the part is already reserved for inode table to store it’s metadata. Here metadata means data about data i.e data about our storage.

And notice onething ……..allocated space is null ( Alloc PE / Size =0) .

Let’s do partition of this new H.D/storage/Logical Volume

Creating a partition let’s say of 1.5GiB or GB by using command :

#lvcreate — size 1.5GB — name partition-name vg-name

Now to confirm partition is created or not we use command :

#lvdisplay vg-name/partition-name

Let’s do now format using command : #mkfs.ext4 /dev/myvg/mylv

Now to mount firstly we will create a directory or folder because to interact to device storage user need a folder using this command : #mkdir /foldername

Here i am giving data node folder . Let’s create one folder named ‘dn’ in slavenode for datanode services .

command :: mkdir /dn

We will mount this using command : #mount /dev/vg-name/lv-name /foldername and by using #df -h command we can confirm whether it’s mounted or not .

🔅 Now we need to configure HDFS cluster.

Inside Master Node ::

Now we will configure hdfs-site.xml file inside cd /etc/hadoop folder

Now let’s configure core-site.xml file

Inside master node we gave neutral-IP 0.0.0.0 you can say that it is a default gateway to reach/connect to any other system IP both privately and publicly.

In my case I am using Port No : 9001 you can check from your system which port no is available by using #netstat -tnlp

Before connecting any data node or using any storage we need to create one folder named ‘nn’ ( using this command #mkdir /nn ) and format master node directory using #hadoop namenode -format

Now to start services of master node we use #hadoop-daemon.sh start namenode and we can verify it by using #jps command

Now master node is configured . Let’s configure data node in this node core-site.xml is configured exactly same as name node only IP changes.

Inside Data Node ::

Let’s configure hdfs-site.xml file inside cd /etc/hadoop folder

Let’s configure core-site.xml file inside cd /etc/hadoop folder

we need to create one folder named ‘dn’ ( using this command #mkdir /dn )

Now to start services of data node we use #hadoop-daemon.sh start datanode and we can verify it by using #jps command

Hence,HDFS i.e hadoop cluster is configured successfully👍🏻 we can verify by using command : #hadoop dfsadmin -report

So now datanode is sharing 1.5 GB storage .

Let’s jump to next process.

🌀Step 2 : Increase or Decrease the Size of Datanode as per requirement .

We can extend the size of partition using command : #lvextend — size +1G dev/myvg/mylv here the volume is extended successfully we can confirm this by using #lvdisplay /dev/myvg/mylv

Here a challenge comes in scenario……….that we can see LV size is now 4.5 GB but while using #df -h it still shows 2.5GiB?Why so??

Because while we were first time doing partition we formatted and mounted only 2.5GiB storage so now to increase size we need to reformat not by using mkfs.ext4 because this command will also remove our important data so to reformat we gonna use command : #resize2fs /dev/arth/mylv1 now no need to again mount it will automatically increase size and we can confirm it by using #df -h command here -h means human readable format because partition is always done in sectors not in MB,GB etc..

hurrahhhh!!!!!!!!!!!!!!!! it’s updated .

Now Let’s check once again hadoop report ………

It is also updated here …..

Conclusion:

In this task we made a HDFS cluster then we learnt how to create LVM architecture and how instead of root folder master uses LVM storage.And we came to know that LVM helps to provide Elasticity to the Storage Device using dynamic partition.

Finally our both Tasks i.e TASK 7.1.A & TASK 7.1.B are successfully accomplished……………..

Thank you for reading my article

Keep Learning🤩Keep Sharing🤝🏻