Monday, September 21, 2009

bzlib/dbd-file - filesystem-based database driver - now available

As previously promised, bzlib/dbd-file, a demonstration of how to extend bzlib/dbi around filesystem, is now available via planet.

You can revisit the posts that chronicled its developments:
  1. Overview of the DBI internals 
  2. a draft driver - list directories and open files
  3. enhance the driver - save files atomically
  4. enhance the driver - delete files atomically 

There isn't a prerequisite for this module, but there are a few caveats:
  • No transaction support (only atomicity)
  • Atomicity not guaranteed on Windows (it'll almost work most of the time)
  • No prepared queries 
  • No SQL - this is not a SQL database driver

The installation over planet is straight forward:

;; in REPL  
(require (planet bzlib/dbi))
(require (planet bzlib/dbd-file))


You should set aside a particular directory as the root for the database.  Let's call that <root-path<.  As with any database - you should not manually access the files within that directory to minimize the chance of destroying the database.

To connect to the database:

;; use the 'file driver  
(define handle (connect 'file <root-path>))  
;; or use the 'file/rs driver 
(define handle (connect 'file/rs <root-path>))

The difference between the two drivers is that 'file/rs returns values that are compatible with the DBI query helper functions such as rows, rows/false, cell/false, since it converts the underlying return value into recordset structure, so it is just a bit less efficient.

To disconnect from the database:

(disconnect handle) ;; NOOP  

The call itself is unnecessary since there are no external resources to clean up.  The prepare is also a NOOP.

To "select" the data from the files by paths:

(query handle 'open  `((path . "/path1") (path . "/path/to/file2") ...)) 
;; => a list of bytes, each bytes is the total content of the file 

 The paths used with the driver (either returned values or passed arguments) are always a jailed absolute path, which will be merged with the root directory to form the actual underlying path.

To "select" the paths a particular base path:

(query handle 'list `()) ;; list the files in the root directory
(query handle 'list `((path . "/some/path"))) ;; list the files in some path 
;; => returns a list of the jailed paths  

To "insert" or "update" a particular path:

(query handle 'save! `((path . "/some/path") (content . #"some bytes")))  

Both the path and content parameters are required in this particular case.  The file will be saved atomically.

To "delete" files or directories:

;; delete files (not directories)
(query handle 'delete! `((path . "/some/path") (path . "/some/other/path") ...))
;; delete empty directories

(query handle 'rmdir! `((path . "/some/path") (path . "/some/other/path") ...))

;; delete either files or directories 
(query handle 'rm-rf! `((path . "/some/path") (path . "/some/other/path") ...))

'delete! will only delete files, and will error if trying to delete directories. 'rmdir! will only delete empty directories, and will error trying to delete either files or non-empty directories.  rm-rf! will delete either files or directories (whether empty or not).  The deletion are atomic on non-Windows platform.

Notes About Windows

Since Windows locks opened filehandles, and generally have background processes such as antivirus softwares that randomly open files for inspections, you might find the save and deletion operations error out intermittently.  You can of course handle the errors and then retry the operations to get around the issues, but basically supporting atomic file-based save/deletion is out of scope.

That's it for now - enjoy.

No comments:

Post a Comment