Monday, November 9, 2009

Building a Web Session Store (1)

Given that we have previously determined the need for a web session store even if we are using continuations, we'll go ahead and build it on top of our DBI stack, so the session data can be persisted as long as necessary.

Quick Word About Session Store Performance

One thing to note about session data is that its data usage is both read and write intensive, and such data can put strain on the database. It's write-intensive because with each request we'll extend the expiration time on the session itself, and it's read-intensive because the data is needed for every request, but it changes with every request.

For now we'll assume that our database is capable of handling such needs (and it will until you have a sufficiently large amount of traffic), but it's something to keep in mind. The nice thing of building the session logic on top of DBI is that when we need to deal with the performance issue, we can add logics into the DBI tier easily with developing a customer driver, for example, by integrating memcached as a intermediate store that'll flush out the changes to the database once a while instead of with every request.

Active Record

The active record pattern are not just for OOP fanatics - we schemers know that you can craft your own OOP with FP easily. In DBI today there is a base structure for active record definition:

(define active-record (handle id)) 
Such definition is a lot simpler than the usual OOP representations, which usually try to construct the data model in memory, along with dynamically constructed SQL statements. Although such OOP records provide simplicity for the simple cases, it has proven to be a leaky abstraction due to the object vs relational paradigm mismatch, as well as a significant performance overhead. Our simple definition will do us just fine right now.

What would our session API look like then?

;; expiration a julian-day 
;; store is a hash table 
(define-struct (session active-record) (expiration store) #:mutable) 

;; the session key/value manipulation calls... 
(define (session-ref session key (default #f)) ...) 
(define (session-set! session key val) ...) 
(define (session-del! session key) ...) 
;; the persistence calls 
(define (build-session handle ...) ...) 
(define (save-session! session) ...) 
(define (refresh-session! session) ...) 
(define (destroy-session! session) ...) 

We'll go through and flesh out the definitions in details.

The Store in Memory

Hashtable is a good internal representation of the key/value pairs that session will hold (for now we'll assume the held data are serializable... we'll deal with this problem later), and this immediately tell us what session-ref, session-set!, and session-del! will look like:

(define (session-ref session key (default #f)) 
  (hash-ref (session-store session) key default)) 

(define (session-set! session key val) 
  (set-session-store! session 
                      (hash-set (session-store session) key val)))

(define (session-del! session key) 
  (set-session-store! session 
                      (hash-remove session key)))
And yes - we are using immutable hash rather than mutable hash.

When to Persist

You probably have noticed that session-set! and session-del! do not persist out to the database. So if you have multiple concurrent connections for the same session, it might be possible for the session object to get out of the sync.

While this is possible, the chance of it happening isn't great, since for the majority of the time users are going to make one main request at a time, with many auxiliary requests for accompanying images and css files that should not modify session values.

On the other hand, saving every changes with each single session-set! call could drastically increase the read & write access for the session object (what's the point of saving with each write if you are not doing the same for read?) and could have detrimental impact on performance unless we are ready to implement an intermediate cache.

And finally such decoupling actually simplify the code (I have written the code with the other approach for comparison) and makes it look more refactored. So for now we'll go with this approach.

Hence we'll persist at the end of the request with a call to save-session!. A simple wrapper so you do not have to explicitly write the separate call would be:

(define (call-with-session session proc) 
  (dynamic-wind void 
                (lambda () 
                  (proc session)) 
                (lambda ()
                  (save-session! session))))
And with a current-session parameter we can simplify it as:

(define current-session (make-parameter #f))

(define (with-session session proc) 
  (call-with-session session 
                     (lambda (session)
                       (parameterize ((current-session session)) 
                         (proc)))))
Except for one bug, the above will work as you expected in web-server environment. If you have an idea of what the bug will be - please feel free to make a comment. I'll discuss the bug and how to fix it in the next post for the series.

No comments:

Post a Comment