Voldemort- Installation to First Run


What is Voldemort?

Voldemort is a distributed key-value storage system

  • Data is automatically replicated over multiple servers.
  • Data is automatically partitioned so each server contains only a subset of the total data
  • Server failure is handled transparently
  • Pluggable serialization is supported to allow rich keys and values including lists and tuples with named fields, as well as to integrate with common serialization frameworks like Protocol Buffers, Thrift, and Java Serialization
  • Data items are versioned to maximize data integrity in failure scenarios without compromising availability of the system
  • Each node is independent of other nodes with no central point of failure or coordination
  • Good single node performance: you can expect 10-20k operations per second depending on the machines, the network, the disk system, and the data replication factor
  • Support for pluggable data placement strategies to support things like distribution across data centers that are geographical far apart.

It is used at “LinkedIn” for certain high-scalability storage problems where simple functional partitioning is not sufficient. It is still a new system which has rough edges, bad error messages, and probably plenty of uncaught bugs. Let us know if you find one of these, so we can fix it.

How is it different?

Voldemort is not a relational database, it does not attempt to satisfy arbitrary relations while satisfying ACID properties. Nor is it an object database that attempts to transparently map object reference graphs. Nor does it introduce a new abstraction such as document-orientation. It is basically just a big, distributed, persistent, fault-tolerant hash table. For applications that can use an O/R mapper like active-record or hibernate this will provide horizontal scalability and much higher availability but at great loss of convenience. For large applications under internet-type scalability pressure, a system may likely consists of a number of functionally partitioned services or apis, which may manage storage resources across multiple data centers using storage systems which may themselves be horizontally partitioned. For applications in this space, arbitrary in-database joins are already impossible since all the data is not available in any single database. A typical pattern is to introduce a caching layer which will require hashtable semantics anyway. For these applications Voldemort offers a number of advantages:

  • Voldemort combines in memory caching with the storage system so that a separate caching tier is not required (instead the storage system itself is just fast.
  • Unlike MySQL replication, both reads and writes scale horizontally
  • Data portioning is transparent, and allows for cluster expansion without rebalancing all data
  • Data replication and placement is decided by a simple API to be able to accommodate a wide range of application specific strategies
  • The storage layer is completely mockable so development and unit testing can be done against a throw-away in-memory storage system without needing a real cluster (or even a real storage system) for simple testing

The source code is available under the Apache 2.0 license.

Prerequisites for Voldemort

The following are the prerequisites for the install of Voldemort. The one in parenthesis are what I have.

  • Linux (Debian Lenny)
  • Java 6
  • Ant
  • A lot of Memory (But my install has been pretty low end :)
  • A lot of machines or VMs. (This install is single node)

Voldemort Install

Just one assumption that all these instructions are run as root user.

This voldemort install is source install and i need to get the sources from “github”:http://github.com/voldemort.

$ git clone git://github.com/voldemort/voldemort.git

voldemort require ant and java6 to be installed. For this you need to setup the Lenny’s non-free repositories before you use apt to install them you need to edit the apt sources file.

$ vi /etc/apt/sources.lst

and add the following line

deb http://ftp.iitm.ac.in/debian/ lenny non-free

and update the apt

$ apt-get update

Now you could install both java6 and ant.

$ apt-get install sun-java6-jre
$ apt-get install sun-java6-jdk
$ apt-get install ant

Now you need to setup the java executable and library path

$ export PATH=/usr/lib/jvm/java-6-sun-1.6.0.12/bin:$PATH
$ export LD_LIBRARY_PATH=/usr/lib/jvm/java-6-sun-1.6.0.12/lib:$LD_LIBRARY_PATH

NOTE: For them to take permanent effect set them up in the .bashrc file pr .bash_profile file.

Now that the prerequisites are installed and set up you need to go the checkout directory of valdemort and do the following.

$ ant

This should run without errors for the setup to complete. Pls comment if you have any issues here. Lets work to resolve it.

First(test) run

Now that voldemort is setup. I ran across some issue as i was running a low end machine. By default voldemort assumes that you have 2GB machine. So need to tweak two files to show tell voldemort what i have to spare for it. First edit the run script

$ vi bin/voldemort-server.sh

and the edit this

if [ -z $VOLD_OPTS ]; then
  VOLD_OPTS="-Xmx2G -server -Dcom.sun.management.jmxremote"
fi

to this

if [ -z $VOLD_OPTS ]; then
  VOLD_OPTS="-Xmx256M -server -Dcom.sun.management.jmxremote"
fi

Second edit the properties file to readjust the caching parameter.

$ vi config/single_node_cluster/config/server.properties

and change this

bdb.cache.size=1G

to this

bdb.cache.size=128M

Now i was ready to run the voldemort. To run the server do the following.

$ ./bin/voldemort-server.sh config/single_node_cluster 2>&1 | tee /tmp/voldemort.log &

This will give a verbose output telling you whats happening and the status of the server. When it says the server is started press enter to return to the bash prompt. Then connect to voldemort prompt and try out a few commands.

$ bin/voldemort-shell.sh test tcp://localhost:6666
Established connection to test via tcp://localhost:6666
> put "hello" "world"
> get "hello"
version(0:1): "world"
> delete "hello"
> get "hello"
null
> exit
k k thx bye.

Now Voldemort has been set up, you use it a distributed key-value database. This distributed Key-Value database is an open source clone of Amazon Dynamo which is now dubbed as the AWS SimpleDB.

Why do we need it?

Couple of things.

  • Just for fun of knowing.
  • Possibility adding it as a SimpleDB for Eucalyptus.
  • Another capability

References

http://project-voldemort.com/
http://groups.google.com/group/project-voldemort

Leave a comment