Recall HOWTO

This document describes how to use Recall. It shows step-by-step, how to write a server. A number of the implementation decisions are discussed, and there is a high-level description of the source files. This should be everything you need to get started.

Example

In this example, we are going to build a fault-tolerant configuration database. This is a simple attribute-value pair database we will call Av. For this example, we'll implement this database with a python dictionary (hash table).

This is how the the library might be used in a program:

      av.write("update_rate_hz", "60")
      av.write("background_color", "brown")
      color = av.read("background_color")
    

Unless you change it, the arguments and return values for the Recall read/write methods use the dynamic CORBA type "any". So the interface is more likely to look like this:

      recall.write(any(Av.WriteRequest("update_rate_hz", "60")))
      recall.write(any(Av.WriteRequest("background_color", "brown")))
      color = recall.read(any("background_color")).value()
    

Note the Corba module is a Recall module. The any function converts an object from the IDL into an any object.

We'll need to define the data types to send and receive so that they can be exchanged. Here's the IDL for our Av service:


	module Av {

	  typedef string ReadRequest;
	  typedef string ReadResponse;
	  typedef string WriteResponse;

	  struct WriteRequest
	  {
	    string attr;
	    string value;
	  };


	};

    

The only data type that really needs to be defined is WriteRequest since it doesn't map directly to a CORBA type.

Now we need to define our Storage so that the reads and writes can be saved persistently. This example will implement them as trivially as possible.

First, the imports:


        "Simple Attribute-Value Storage"
	# Infrastructure classes
	from Recall.idl import Recall__POA
	from Recall.CorbaUtil import any

	# Interface
	from Av.idl import Av

	# Use to actually implement persistence
	import pickle

The class definition:


	class Storage(Recall__POA.Storage):
	    "Storage implementation that stores attribute-value strings"

Subclassing from Recall__POA.Storage allows this class to be created by a Recall replica. Read and write requests will be forwarded to this class. Let's implement those methods. First, read requests:

	    # class Av/Storage

	    def read(self, args):
		result = self.storage.get(args.value(), "")
		return any(result)

The member variable self.storage is the python dictionary. Now lets take a look at the more complex write method:

	    # class Av/Storage

	    def write(self, serverSequenceId, requestId, args):

		writeRequest = args.value()
		self.storage[writeRequest.attr] = writeRequest.value
		self._save()

		return any(writeRequest.value)

The first step extracts the actual WriteRequest object from the Any. The second statement performs the actual write operation to the state information held in memory. The third statement stores the whole state to disk. Obviously this won't work for a very large database. We'll see the definition of _save below.

That's the basic server, and it will work fine as long as nothing ever goes wrong. We must still implement storage "catchup" for the times that a server loses its persistent store or when it is out-of-date due to system or network failure.

Catchup refers to the process of copying data from the master to a replica (slave) without data or to a replica with old data. So each storage mechanism must be able to make a snapshot of the current state for sending to out-of-date replicas. Our storage will send the dictionary over one attribute-value pair at a time.. First, lets define the data type to send when copying the data over. In Av/Av.idl, we add


          /* module Av in Av/Av.idl */

	  typedef WriteRequest CatchupPacket;

Now lets write the three methods for accepting the catchup data:


	    # class Av/Storage

	    def startCatchup(self):
		self.storage = {}

	    def catchupSome(self, args):
                packet = args.value()
		self.storage[packet.attr] = self.storage[packet.value]

	    def finishCatchup(self):
		self._save()

startCatchup resets the internal state to empty. The catchupSome method gets all the state in one package. If we were replicating a large database, or a file system, this would not be appropriate, but because this is a trivial example, we can just convert the pickled (serialized) dictionary back into a python object. finishCatchupsaves the dictionary to disk. The method to do that is given below.

Since a master can catchup multiple out-of-date replicas, the interface is a little different. The Storage object must act as a factory of "Catchup" objects. These little classes act like iterators over a the current snapshot. Recall will use one of these objects to incrementally copy state from the master for each of the out-of-date replicas.

First, lets make the Factory Method:

	    def createCatchupStrategy(self, replicaCurrentEpoch):
		return CatchupStrategy(self.storage)
Not much there, so lets look at the CatchupStrategy class itself:

	class CatchupStrategy:
	    "Copy snapshot of Av for an out-of-date Replica"

	    def __init__(self, storage):
		# create snapshot
		self.data = storage.values()

	    def hasMore(self):
		return len(self.data) > 0

	    def next(self):
		# create an Any
                attr, value = self.data.pop()
		return any(Av.CatchupPacket(attr, value))

Notice the CatchupStrategy must take a snapshot of the current state. Soon after the catchup object is created, but before the catchup completes, service will begin, and the storage object will begin writing to the state. All that remains is the constructor, to initialize the system, and to save the state to a file:

            def __init__(self, echo):
                self.name = echo.thisReplica.name
                self.storage = {}
                self.catchup = None
                try:
                    fp = open(self.name + ".state", "rb")
                    self.storage = pickle.load(fp)
                    fp.close()
                except IOError:
                    pass
                self.deferred = []

            def _save(self):
                fp = open(self.name + ".state", "wb")
                pickle.dump(self.storage, fp)
                fp.close()


	# end of file
	# 

To start the servers, run them (perhaps on different machines):
         python server.py Av/Storage server1 server2 server3 

         python server.py Av/Storage server2 server1 server3 

         python server.py Av/Storage server3 server1 server2

Now these servers will start and exchange status, and elect a master. Clients can begin requesting service. Note that much of this code is basic CORBA code (although some of it, such as finding the NamingContext are in the utility module "Corba"):

    "Simple client for Av"
    from Recall import CorbaUtil
    from Recall import ClientImpl
    from Recall.idl import Recall

    from Av.idl import Av

    def main():
	# Tune corba interface a bit 
	args = [ "-ORBclientCallTimeOutPeriod", "3",
		 "-ORBscanGranularity", "3"]
	c = CorbaUtil.Corba(args)

	# create local client implementation
	impl = ClientImpl.ClientImpl(c, ['server1', 'server2', 'server3'])
	c.activatePOA()
	client = impl._this()

        writeRequest = \
            CorbaUtil.any(Av.WriteRequest("my_favorite_color", "blue"))
	client.write(writeRequest)
	assert( client.read("my_favorite_color").value() == "blue" )

        print "Test Successful"

    if __name__ == "__main__":
        main()

All but the last two lines are basic set-up to create the local Client and an appropriate naming context to contact the Recall servers. The last two lines save a configuration value, and read it back in. The update is saved to all the available servers. Any switch-overs or re-sends will be transparently handled by Recall.

Implementation Issues and Trade-offs

There are a number of things that Recall is now assuming that will eventually be configurable.

Source Files

Elements from the Echo algorithm
Recall/Recall.idl Recall/Recall.idl describes the external interfaces of the Recall library. This file is copiously documented.
Recall/EchoImpl.py Implements the Echo interface. This object takes client requests, starts elections and manages the basic book-keeping for the algorithm.
Recall/ClientImpl.py Implements the "Clerk" interface, which should be co-located with your client. This interface will track the current master replica retry requests if there are failures.
Recall/Election.py

Implements the Election thread as described in the algorithm.
Recall/PreserveAgreements.py Implements the PreserveAgreements thread as described in the algorithm. This thread implements a "heartbeat" or failure detector. It normally runs in the master and will periodically contact (or try to contact) all the replicas.
Recall/SlaveTimeout.py Implements the SlaveTimeout thread as described in the algorithm.
Recall/Replica.py Bookkeeping for remote replicas. Provides a callable interface for each remote server.
Recall/RunState Defines the enumeration for the run states defined in the Echo algorithm.
Recall/Recovery.py Implements the recovery thread as described in the algorithm.
Recall/Catchup.py Implements the catchup thread for out-of-date replicas.
Support files
Recall/server.py The main program for a single replica. This configures the server communication parameters, the Log module, interprets command line and runs the Orb (main event loop).
Recall/CorbaUtil.py A module providing utilities on top of the CORBA interface.
Recall/Log.py Provides for logging messages and errors as the server executes.
Recall/Thread.py A class for creating a thread that supports cooperative shutdown.
Recall/DuplicateDetector.py Detects client re-sends.
Recall/EpochStorage.py Saves the epoch values persistently.
Recall/InProgress.py Saves the last N-In-Progress updates persistently.
Recall/WriteToken.py Allows up to N concurrent writers within the master.
Test files
Ar/Ar.idl, Ar/Storage.py A storage mechanism like the "Ar" interface described in the algorithm paper.
Block/Block.idl, Block/Storage.py A storage mechanism with a more realistic implementation of data blocks.
Av/Av.idl, Av/Storage.py HOWTO example
IncrDecr/IncrDecr.idl, IncrDecr/Storage.py A storage mechanism whose reads/write produce different states if the updates are performed in the wrong order.
*.sh, test*.py Test drivers.

Eric C. Newton
Last modified: Sun Jun 24 12:30:27 EDT 2001