GlusterFS: Distributed Filesystem on Euca Instances


Been about two months since I have done any serious blogging. Got a little too busy I guess, both at work and at home. Yes doing many things and nothing really exceptionally fruitful personally though.

For a project requirement I was exploring some distributed filesystems. And hit upon GlusterFS, for which the installation documentation said it will work just by doing apt-get on Ubuntu 10.04, and well jumped in my seat. And just did that.

# apt-get install glusterfs-*

And saw that version that go installed as, 2.0.2, and current stable/GA is 3.0.4. Brrrr! Well I did not give up there, I read through the GlusterFS documentation I stopped right here…

This documentation holds good only for “glusterfs-volgen” from 2.0.9 and 3.0.x releases

That was real bummer. So what I did was to I brought down the Euca Ubuntu instances, and brought up two new instances. And dowloaded the GlusterFS source from here.

Before I jump to the installation and configuration of distributed filesystem, GlusterFS, here are the instance details.

+ Two small instances
– 172.19.1.34
– 172.19.1.35
+ Two EBS Volumes
– 172.19.1.34 – 5G mounted at /export/sdb – ext3
– 172.19.1.35 – 5G mounted at /export/sdb – ext3
+ Installed fuse-utils, flex and bison on both instances

To put the end first, the installation was a breeze. But there are too many loose ends on the GlusterFS documentation front. The installation steps that follow, are the same for both the instances. I will tell you when there is a difference.

Download the source tar.gz file

# wget http://ftp.gluster.com/pub/gluster/glusterfs/3.0/LATEST/glusterfs-3.0.4.tar.gz

Un(tar/zip), configure and install

# cd glusterfs-3.0.4
# ./configure
# make ; make install

The package files/directories will be installed in /usr/local/{bin,lib,var,etc}. Make sure the ldconfig is set appropriately. Now from any one of the machines you could run the following command to create the distributed volumes definition files, and client vol configuration file, I have chosen to give the name “hawaii” to this distributed store.

# glusterfs-volgen --name hawaii 172.19.1.34:/export/sdb 172.19.1.35:/export/sdb

This will create the following

# ls -l *.vol
-rw-r--r-- 1 root root  653 2010-07-07 10:36 172.19.1.34-hawaii-export.vol
-rw-r--r-- 1 root root  653 2010-07-07 10:36 172.19.1.35-hawaii-export.vol
-rw-r--r-- 1 root root 1430 2010-07-07 10:36 hawaii-tcp.vol

The hawaii-tcp.vol is the client configuration file. That you carry to any machine and install glusterfs and mount the distributed file system locally.

The other are node specific glusterfs volume configurations, specifying which mountpoints are going to be part of the the distributed glusterfs, in this case /export/sdb on both instances.

Now cp/mv the node specific volume configurations to the respective nodes. Also copy the client configurtions to all nodes. And the following on both the nodes.

# cp 172.19.1.3x-hawaii-export.vol /usr/local/etc/glusterfs/glusterfsd.vol
# cp hawaii-tcp.vol /usr/local/etc/glusterfs/glusterfs.vol

Start the glusterfs daemon on both the nodes.

# glusterfsd

You could see the logs in the /usr/local/var/log/glusterfs. Now mount the distributed filesystem

# mkdir -p /mnt/hawaii
# glusterfs --volfile-server=172.19.1.35 /mnt/hawaii

You run those commands on both the instances. Now have look at your file system one of the instance,

# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1             2.9G  774M  2.0G  28% /
udev                  248M  148K  247M   1% /dev
none                  248M     0  248M   0% /dev/shm
none                  248M   52K  248M   1% /var/run
none                  248M     0  248M   0% /var/lock
none                  248M     0  248M   0% /lib/init/rw
/dev/sda2             1.4G   15K  1.3G   1% /mnt
/dev/sdb              5.0G  139M  4.6G   3% /export/sdb
172.19.1.35           9.9G  277M  9.1G   3% /mnt/hawaii

Note: The distributed file system is 10G.

Now you could create a file on one of the instances. And read or write to the file from another instance. You have now a distributed file system up and running.

You could also raise a third instance. And install the GlusterFS from source there and use only the client configuration hawaii-tcp.vol, i.e.,

# cp hawaii-tcp.vol /usr/local/etc/glusterfs/glusterfs.vol

And then mount the as the distributed filesystem

# mkdir -p /mnt/hawaii
# glusterfs --volfile-server=172.19.1.35 /mnt/hawaii
# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1             2.9G  774M  2.0G  28% /
udev                  248M  148K  247M   1% /dev
none                  248M     0  248M   0% /dev/shm
none                  248M   52K  248M   1% /var/run
none                  248M     0  248M   0% /var/lock
none                  248M     0  248M   0% /lib/init/rw
/dev/sda2             1.4G   15K  1.3G   1% /mnt
172.19.1.35           9.9G  277M  9.1G   3% /mnt/hawaii

And now you are ready to read and write data to this distributed filesystem. Having done this on Eucalyptus, it would be fairly similar to do something EC2 as well.

Advertisements

2 thoughts on “GlusterFS: Distributed Filesystem on Euca Instances

  1. As a computer science degree student, time is very important. And, time allocated for each assignment and work are very limited. If you ever being student, you will understand.

    There is one of the assignment, which I need to choose an open source distributed file system to enhance it. I had choosen GlusterFS.

    There are many source files in GlusterFS. Most of the files consist of very long source codes. There are no comments and explanation in the source files.

    Since GlusterFS is a server application, I want to find out which part of the source codes accept the connection from client. However, due to the complexity of the source codes, I need to spend huge amount of time to find out the part of the source codes that accept connection from client.

    I am assuming that when I found which part of the source codes accept client connection, my semester may be ended.

    So, is there easy and faster way to analyze the source codes of an open source project and find out the part of the source codes whch implement a specified features?

    Or, can you give me some suggestions or comments?

    And, I hope that the solution will not require me to contact the developers.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s