Thursday, September 17, 2009

Create a Driver for bzlib/dbi (2) - Filesystem Driver

In our previous post - Create a Driver for bzlib/dbi (1) - DBI Internals - we discussed the motivation and DBI internals so we can get started to implement our filesystem-based driver.  If you find some of the following a bit opaque - read the previous post for some explanations.


Caveats about Filesystems and the Driver

Since each filesystem has different behaviors it is difficult to guarantee particular performance characteristics about the drivers. For now the following are the caveats:
  • No transactional support - it takes more effort to build in transactional support for bare filesystems - something we will not tackle for now 
  • No atomic writes for Windows filesystem - as Windows filesystem does not fully support atomic rename (Windows program holds lock on the file if it is opened during the rename and will cause an error), we also cannot make guarantee that writes will be successful in Windows.  Under Unix variants we can guarantee atomic writes 
Furthermore, since our goal is to have a simple driver, the following are also out of scope:
  • SQL statement mappings - for now we will not support SQL statements
  • Prepared statements - since we are not supporting complex SQL mappings, there is no reason to have prepared statement capabilities; i.e. the prepare call will be a no op.
With caveats out of the way - let's determine what we should be able to do at a minimum:
  • manage all files within a single directory as the database data 
  • open files by path and return their contents 
  • listing the files in a particular directory (within the base directory) 
  • save data against a particular path (this can either be an "insert" or "update" operation) atomically on unix platform (might cause errors on Windows given the limitation of Windows platform) 
  • delete a particular file 
  • delete a particular directory 
  • create a particular directory (even without the intermediate directories) 
 Allright - let's get started.

Connect, Disconnect, Prepare, and Transaction Handling 

Since we want to have a directory representing the root of the database, our database connection is really the root directory:

#lang scheme/base 
(require (planet bzlib/base)
         (planet bzlib/dbi)
         )

(define (file-connect driver path) 
  (assert! (directory-exists? path)) ;; assert! comes from bzlib/base 
  (make-handle driver path (make-immutable-hash-registry) 0)) 

Disconnect is even more straight forward, since there isn't any external resources that have to be released:

(define (file-disconnect handle)
  (void)) 

And since prepare is out of scope, it is also a NOOP:

(define (file-prepare handle stmt)
  (void)) 

Furthermore, transaction support is also out of scope - we have more NOOPs:

(define (file-begin handle)
  (void))
(define (file-commit handle)
  (void))
(define (file-rollback handle)
  (void)) 

The default transaction functions will not suffice here since they issue the corresponding SQL statements against the handle.

Assuming we have the corresponding file-query defined we now have a complete driver with:

(registry-set! drivers 'file
               (make-driver file-connect
                            file-disconnect
                            file-query
                            file-prepare
                            file-begin
                            file-commit
                            file-rollback))
    
Now we just need to flesh out file-query, which is the meat of the driver:


List the Files In a Path

Let's do it one step at a time - and the first step would be to list the files in a path.  Keep in mind that the path is a *path* within the root directory, and since we have control over the specifications, let's make the path appears as an absolute path with Unix syntax.

Example - let's say we want to check the files located at path /foo/bar, and the root directory of the database is /var/bzlib/data/, then the combined path should be /var/bzlib/data/foo/bar.

The call through query would then look like:

(query handle 'list `((path . "/foo/bar")))  ;; notice it's a symbol 

Which should then return a list of paths within /foo/bar (let's say there are 3 files, abc.txt, def.txt, ghi.txt), which would look like:

'("/foo/bar/abc.txt" "/foo/bar/def.txt" "/foo/bar/ghi.txt")  

Note the returned paths are also absolute paths within the database - yes, this filesystem database does not work with relative paths.

Allright - the following code will satisfy our list needs so far:

(require (planet bzlib/file))  
(define (file-query handle stmt (args '())) 
  (define (path-helper path)
    (if (not path)
        (handle-conn handle)
        (build-path* (handle-conn handle) path)))
  (define (convert-path path)
    (relative-abs-path (handle-conn handle) (path-helper path)))
  (case stmt
    ((list)
     (let ((path (path-helper (assoc/cdr 'path args))))
       (if (directory-exists? path)
           (map convert-path (directory-list path))
           #f)))
    (else 
     (error 'file-query "unknown statement: ~a" stmt)))) 
 
The not-yet released package bzlib/file contains utility functions for manipulating paths and files on top of scheme/path and scheme/file, a couple of which are introduced here:
  • build-path* - used to build paths in similar fashion as build-path, except the trailing segments can themselves be a full path instead of individual segments (i.e. the following is legal for build-path*: (build-path* "/var/data" "/abc/def/ghi" "/foo/bar") ;; => "/var/data/abc/def/ghi/foo/bar"
  • relative-abs-path: used to return the relative path against a base in the absolute form we specified above: (relative-abs-path "/var/data/" "/var/data/abs/def/ghi") ;; => "/abc/def/ghi"
Return Format 

By default, bzlib/dbi does not enforce any sort of return format, so you can simply return the results as you see fit.  However, if you wish to use the query helper functions such as rows, cell/false, etc, you'll need to ensure the data are return as a list of rows, where each row is also a list of cells (the cells can be anything, with the scheme null as database NULL), and the first row is the list of column names.  Their usages are also optional, but you'll need to inform your users whether your driver works with those helper functions.

Of course - you can have your cake and eat it too, if you provide two separate drivers - one driver does not work with the query helper functions, but the second driver extends the first driver by wrapping around the query results and convert it into the recordset format.  For our current example we can do the following:

(define (file-recordset-query handle stmt (args '())) 
  (let ((value (file-query handle stmt args))) 
    (case stmt 
      ((list) 
       (if (not value) value
           (map list (cons "path" value)))))))

(registry-set! drivers 'file/rs
               (make-driver file-connect
                            file-disconnect
                            file-recordset-query
                            file-prepare
                            file-begin
                            file-commit
                            file-rollback))

Now the 'file/rs driver will work with rows, cell, cell/false, etc.

Reading File(s)


Let's add one more capability in this post - let's read the files based on passed in path.  And to make it interesting, we'll take in multiple paths, so something like this:

(query handle 'open `((path . "/abc.txt") (path . "/def.txt") (path . "/ghi.txt"))
;; => (listof bytes?) 

Below accomplish the goal:

(define (file-query handle stmt (args '())) 
  (define (path-helper path)
    (if (not path)
        (handle-conn handle)
        (build-path* (handle-conn handle) path)))
  (define (convert-path path)
    (relative-abs-path (handle-conn handle) (path-helper path)))
  (define (get-paths)
    (map path-helper (map cdr (filter (lambda (kv)
                                        (equal? (car kv) 'path)) 
                                      args))))
  (case stmt
    ((list)
     (let ((path (path-helper (assoc/cdr 'path args))))
       (if (directory-exists? path)
           (map convert-path (directory-list path))
           #f)))
    ((open)
     (map file->bytes (get-paths)))
    (else 
     (error 'file-query "unknown statement: ~a" stmt))))

And to make it work for 'file/rs driver, we should also update it correspondingly:

(define (file-recordset-query handle stmt (args '())) 
  (let ((value (file-query handle stmt args))) 
    (case stmt 
      ((list) 
       (if (not value) value
           (map list (cons "path" value))))
      ((open)
       (map list (cons "content" value)))))) 

Now our drivers will read in all of the file contents as bytes and return them.  Stay tuned for the addition of other features...

No comments:

Post a Comment