Wednesday, September 23, 2009

Developing a Memcached Client (1) - Storage

[This is a continuation of the network development in PLT post]

Memcached is all the rage these days with the high scalability guys.  Legends have it that as soon as you slap a couple of memcached in between your web app and your database server, all of your scalability issues go away.  Even if that's glossing over of details, memcached certainly offers advantages over accessing the discs all the time, and we want it in PLT.

Let's try to build a memcached client.

Steps to Build a Client (or a Server)

The steps to build either a client or a server is pretty straight forward (independent of the complexity of the protocol):
  1. study the protocol 
  2. write the "output functions" that sends the info to the other party 
  3. write the "input functions" that parses the input from the other party 
  4. combine the input & output functions, and manage the states of the connection (if applicable)
You are likely to mix the steps up instead of doing them in the order prescribed above and that's okay.  The steps are numbered to help us to think of them as an orderly process, but the process is likely more iterative.

Allright - let's get started.

Study the Protocol 

The key to build a network interface is to understand the protocols.  It is best if the protocol is open and published, so you have something written that you can study (hopefully in a language you understand).  The next best option is to reverse engineer from an open source implementation, and the worst is to reverse engineer from sniffing the network interactions between the clients and the server.  I am glad such memcached's protocol is already published so we do not have to go down the other routes.

Please refer to the memcached protocols throughout this process as the official reference.  We'll sprinkle information from the official doc in the posts as necessary.

First a quick overview of the memcached protocol.
  • request/response protocol - there are no unexpected responses that the client has to handle 
  • multiple requests and responses within a single session 
  • *mostly* line-based protocol - the commands are single lines that are terminated by CRLF; the data is also terminated by CRLF, but it is actually determined by length parameter
  • can store, retrieve, and delete the content (as well as flush) 
  • there are also statistics that can be retrieved, but the protocol appears in flux for the stats (so it'll be out of scope for now) 
  • the protocol itself is *stateless* as each of the requests and responses are independent from each other (each request is atomic) 
Overall the protocol is simple yet non-trivial, which gives us a great example for network development.  We'll start with the storage part of the protocol.

Storage API

Memcached has multiple storage commands for different situations:

  • set - acts as both insert and update (overwrite)
  • add - insert only if the object does not exist 
  • replace - update only if the object exists 
  • append - update by adding the data after the existing data 
  • prepend - update by adding the data before the existing data 
  • cas (check and set) - store the data but only if it has not been changed by others (think of version conflict management) 
And for everything but cas the API looks like the following:

<cmd> <key> <flags> <expire-time> <bytes-length> [noreply]\r\n
<data-block>\r\n
While cas looks like:

cas <key> <flags> <expire-time> <bytes-length> <cas-unique> [noreply]\r\n
<data-block>\r\n
What we want is to have functions that represents the sending of such commands from the client to the server.  The following functions will satisfy our requirements:

(define (display-line out fmt . args)
  (display (apply format (string-append fmt "\r\n") args)
           out)
  (flush-output out))

(define (display/noreply out fmt noreply? . args)
  (apply display-line out (if noreply? (string-append fmt " noreply") fmt) args))

(define (cmd-store! out type/cas key flags exp-time bytes (noreply? #f))
  (case type/cas
    ((set add replace append prepend)
     (display/noreply out "~a ~a ~a ~a ~a" noreply? type/cas key flags exp-time (bytes-length bytes)))
    (else
     (display/noreply out "cas ~a ~a ~a ~a ~a" noreply? key flags exp-time (bytes-length bytes) type/cas)))
  (display-line out "~a" bytes))
When we are ready to set a data, we pass the data (along with the TCP output-port) to the cmd-store! procedure, and the data will be formatted appropriately according to the protocol.

Handling Responses for Storage API 

Assuming we have sent the requests successfully, then we'll need to read the responses in from the server.

According to the protocol spec the following are possible responses:
  • if noreply is sent - there will not be a response from the server
  • STORED\r\n means it's a success
  • NOT_STORED\r\n means it's not stored but not a failure 
  • EXISTS\r\n is an error for the cas command (it has been modified before your storage) 
  • NOT_FOUND\r\n is an error for the cas command that the object does not exist or has been deleted 
Since the response in this case is all on a single line, we just need to read in the line and then convert it into symbols that we can use to determine success of failure.

(define (response-store in (noreply? #f))
  (if (not noreply?)
      (let ((ln (read-line in 'return-linefeed)))
        (string->symbol (string-downcase ln)))
      'stored))
We'll punt the handling of the message one level further up so we can handle both the request and the response together in this case.

(define (store! in out type/cas key value #:flags (flags 0) #:exp-time (exp-time 0) #:noreply? (noreply? #f))
    (cmd-store! out type/cas key flags exp-time value noreply?)
    (let ((response (response-store in noreply?)))
      (case response
        ((stored not_stored) response)
        ((exists not_found)
         (if (cas? type/cas)
             (error 'store-cas "~a ~a" key response)
             (error 'store "invalid response: ~a" response)))
        (else (error 'store "invalid response: ~a" response)))))
At this moment we have everything we need to tie together a minimal (store-only) memcached client, but first let's try to represent the concept of a client.

Client Abstraction 

As previously mentioned, we need to manage the request, the response, and the state for either the client or the server, so the following would represent the client well:

(define-struct client (host port in out (state #:mutable)))
Then the initiation and the termination of the client would then be:

(define (client-connect host port (state #f))
  (let-values (((in out)
                (tcp-connect host port)))
    (make-client host port in out state)))

(define (client-disconnect client)
  (close-input-port (client-in client))
  (close-output-port (client-out client)))
Oh wait!  Look at the above closely - and you'll see a resemblance of DBI's connect and disconnect API.      DBI's API is simply a more *generalized* version of a network client connections.

With the abstraction of the client structure set - it's simple to hook the client against the above store! procedure to complete the memcached storage API.  You see the below we follow the API against the protocol commands - this approach makes a nice 1 to 1 mapping between the API and the protocol.

(define (memcached-set! client key value #:flags (flags 0) #:exp-time (exp-time 0) #:noreply? (noreply? #f))
  (store! (client-in client) (client-out client) 'set key value #:flags flags #:exp-time exp-time #:noreply? noreply?))

(define (memcached-add! client key value #:flags (flags 0) #:exp-time (exp-time 0) #:noreply? (noreply? #f))
  (store! (client-in client) (client-out client) 'add key value #:flags flags #:exp-time exp-time #:noreply? noreply?))

(define (memcached-replace! client key value #:flags (flags 0) #:exp-time (exp-time 0) #:noreply? (noreply? #f))
  (store! (client-in client) (client-out client) 'replace key value #:flags flags #:exp-time exp-time #:noreply? noreply?))

(define (memcached-append! client key value #:flags (flags 0) #:exp-time (exp-time 0) #:noreply? (noreply? #f))
  (store! (client-in client) (client-out client) 'append key value #:flags flags #:exp-time exp-time #:noreply? noreply?))

(define (memcached-prepend! client key value #:flags (flags 0) #:exp-time (exp-time 0) #:noreply? (noreply? #f))
  (store! (client-in client) (client-out client) 'prepend key value #:flags flags #:exp-time exp-time #:noreply? noreply?))

(define (memcached-cas! client key value cas #:flags (flags 0) #:exp-time (exp-time 0) #:noreply? (noreply? #f))
  (store! (client-in client) (client-out client) cas key value #:flags flags #:exp-time exp-time #:noreply? noreply?))
That's it!  The following is a quick test of the memcached API that we've just developed, make sure you have memcached up and running:

(let ((c (client-connect "localhost" 11211)))
  (begin0 (memcached-set! c 'test-me #"this is a test data")
          (client-disconnect c)))
;; => 'stored
This rudimentary memcached client demonstrates the process of writing a network client - the rest is to simply complete the protocols.  Some protocols would have more complex serialization and state management requirements, but the rest is the same.  We'll fill out the rest in the upcoming posts.  Stay tuned.

No comments:

Post a Comment