Saturday, September 12, 2009

Extensible Abstract Database Interface for PLT Scheme

There are not a lot of choices of relational database interfaces for PLT - part of the reason is just that the community is not yet the size of Perl or Python, so not as many infrastructure packages yet.

However, PLT does have the interfaces for three of the most important open source databases - postgresql, mysql, and sqlite, so your open source database needs are likely addressed. But the challenge is that each sports a different interface, so you couldn't really abstract the database away to the degree that you can in other languages.

sqlid was probably the closest thing for PLT to have a single interface and multiple drivers, but it hasn't appeared to be active for a while, so I decided to take the plunge to create an abstracte database interface so all three packages can function uniformly. Hopefully it can attract collaborations from driver writers.

Intoducing bzlib/dbi - extensible abstract database interface for PLT

Inspired by Perl's DBI, the package bzlib/dbi follows the separation between the interface and the driver, so if you want to use a particular driver, you just need to require:

(require (planet bzlib/dbi)
(planet bzlib/dbd-<driver>))
Currently there are three such drivers available:
All of the four packages are released under LGPL.

To connect to the database, you just need to issue connect and pass it the symbol that identifies the driver, along with the arguments expected by the underlying driver. Once you have required bzlib/dbi and the appropriate driver, for example, to connect against schematics/spgsql:

(define handle (connect 'spgsql '#:server "localhost" '#:port 5432 ...))

And to connect against jaymccarthy/sqlite:

(define handle (connect 'jsqlite ':temp:)) ;; creates a temp database

To connect against jaz/mysql:

(define handle (connect 'jazmysql "localhost" ...))

Note that if the underlying driver expects keyword arguments, pass in the keyword with a quote, so PLT will treat it as an argument rather than a keyword for connect (which will result in an exception, since connect has no keyword arguments).

To disconnect, just do:

(disconnect handle)

Query and Named Parameters

The goal of bzlib/dbi is to keep the interface as simple as possible, so there is currently only one function that handles the queries:

(query handle "select * from ..." args)


The arguments is a list of pairs of symbol and values. It is currently not optional (so if you have no arguments you would still have to pass in '()).

The query statements takes in named parameteres instead of ordinal parameters, so instead of writing:

select * from table1 where c1 = ? and c2 = ?

You write

select * from table1 where c1 = ?c1 and c2 = ?c2 -- ?c1 & ?c2 are named parameters.

Then you pass in '((c1 . val1) (c2 . val2)) as the argument.

The nice thing about named parameters is that you do not have to worry about matching the order of the parameters, and you only need to specify the value once if you have to pass in the same value to multiple positions. The name can consist of regular alphanumeric characters, underscores, dashes, and exclamation mark.

bzlib/dbi currently standardizes on using question mark as the placeholder and translates the query and the parameters to their corresponding version for each of the drivers. This can be customized.

Prepared Statements

bzlib/dbi also provides the ability to create prepared statements:

;; (prepare handle name-symbol query)
;; example
(prepare handle 'select-table1 "select * from table1 where c1 = ?c1 and c2 = ?c2")
(query handle 'select-table1 `((c1 . ,val1) (c2 . ,val2)))
Basically - the prepared statement is created and kept along with the handle, and identified via the name symbol. And it is run via the same query interface by passing in the name symbol (instead of the query) along with the arguments.

Unlike other database interfaces where the prepared object is returned to the caller for passing around, bzlib/dbi hides them to 1) simplify the interface, and 2) minimize the chance of error when the same underlying handle is used by multiple threads at once - I settle on this design choice when running into issues with a prepared statement run by a separate thread where the handle was terminated by its originating thread so you wouldn't have to manage the same object with multiple threads.

Transactions

You can trigger transactions manually with the SQL statement.

(begin-trans handle)
(commit handle)
(rollback handle)

Or you can use the with-trans syntax, which wraps the transaction management with exception handling - so if there are errors triggered it would automatically cause rollback of the transaction.

(with-trans (handle1 ...)
exp ...)

You can wrap multiple database handles and if one errors all will be rolled back - this is a simple distributed transaction.

Data Type Mapping

The data type mapping depends on the underlying driver, but the following are the mapping for the general types:
  • integer <-> sql integer
  • number <-> sql float
  • srfi/19 date time <-> sql date/time
  • string <-> sql string/text
  • bytes <-> sql binary/blobs
  • null <-> sql null
For bzlib/dbd-spgsql, there is also a mapping for the array type as a list.

The recordset is returned as a list of lists. The first row holds the column names (so there will always be one row in a select query even if there are no results).

Additional Query Helpers

The following provides additional helpers that wraps around query to simplify the developer's tasks in certain situations:
  • exec - used for non select queries
  • rows - strips the column names and just return the rows
  • row - returns the first row (and raise error if no rows)
  • row/false - returns the first row or #f if there are no rows
  • exists? - a shorthand for row/false (looks more natural for people used to SQL)
  • cell - returns the first cell of the first row or throw an error if there are no rows
  • cell/null - returns the first cell of the first row or returns null
  • cell/false - returns the first cell of the first row or returns #f
All of the above helpers also takes in an optional converter parameter right after the args parameter, so you can directly process the result before it returns. The converter should be a function that can take in as many arguments as there are columns in the query. The full signature of these helpers look like:

(query-helper handle symbol-or-query key/val-args converter)


Extensible Database Interface

An important principle of the design is to make the database interface extensible, so code can be reused and new database driver can be developed more easily (of course it would still require intimate knowledge with the particular database's protocol). Hence the interface is extensible with the chaining of drivers.

The following drivers are currently available for extending the capability of the three database drivers.
  • app - provides a kill-safe interface over the raw database driver
  • pool - provides a database connection pool interface over the raw database driver
Kill-Safe App Driver

To make use of the kill-safe app driver, you need to pass in the 'app driver identifier to connect:

(define kill-safe-handle (connect 'app #f <inner-driver-args> ...))

So for example, if you want to make schematics/spgsql kill-safe, use the following:

(define handle (connect 'app #f 'spgsql '#:server "localhost" '#:port 5432 ...))

And if you want to do so with jaymccarthy/sqlite (which doesn't do much since jaymccarthy/sqlite is FFI-based and already serializes all threads):

(define handle (connect 'app #f 'jsqlite "database.db"))

What happens is that bzlib/dbi will initiate the app driver, which then initiates the underlying database driver. The #f arg is the argument that tells the app driver whether there is a pool driver that helps handle the response. You only need to explicitly pass this value in if you are initiating app driver directly.

Database Connection Pool

The database connection pool driver further builds on top of the app driver and manages the database connections. It works in conjunction with the app driver. For example - to initiate a database pool of maximum 10 database connections:

(define pool-handle (connect 'pool 10 'spgsql '#:server "localhost" '#:port 5432 ...))

You do not need to specify the app driver in this case, since the pool driver does that for you internally.

The pool driver is a bit special in that it does not directly handle the queries, but offers up one of its connections to the caller and let the caller use the connection until the caller thread dies. To retrieve a connection from the pool driver, do:

(query pool-handle 'connect '())

which would return the underlying app driver handle.

The app and pool driver demonstrates how bzlib/dbi can be extended. You can certainly extend it further to fulfill your needs (an example of possible future extension would be to create a load balancer handle that sits in front of a database cluster).

TODO

There are a lot more the design of bzlib/dbi can accomodate, but one of the important thing to do is to collaborate with the driver writers so there can be a joint effort. Since bzlib/dbi is the new kid on the block, the effort will have to come later after people have chance to test out the interface. But now you can switch between the three major database with relative ease.

2 comments:

  1. Stumped!

    New to Scheme, and I am not a programmer by trade. I'm excited about the MySQL interface, as I do a fair amount of things with that.

    Here's my problem (and I'm quite sure it's something really basic):

    I can connect using the following:

    (define handle (connect 'jazmysql "localhost" 3306 "user" "password" ))

    However, when I try to pass the optional database to connect to, it fails:

    (define handle (connect 'jazmysql "localhost" 3306 "user" "password" #:set-current? "database" ))

    connect: does not accept keyword arguments; arguments were: jazmysql "localhost" 3306 "user" "password" #:set-current? "database"


    If I just pass the database name to connect,procedure it also fails:

    connect: expects 4 arguments plus optional arguments with keywords #:schema, #:set-current?, and #:use-ssl?, given 5: "localhost" 3306 "user" "password" "database"

    So, in one instance scheme is telling me that connect will accept optional arguments, and in the next it says it won't.

    Cornfusement reigns!

    ReplyDelete
  2. Hi - the reason that you are seeing the error of "keywords" is that you did not escape the keyword with quote.

    so instead of #:<keyword>, you need to use '#:<keyword>.

    In the case of jazmysql, it means you need to pass the following (note the quote in front of #:schema):

    (define handle (connect 'jazmysql "localhost" 3306 "user" "password" '#:schema "database"))

    The reason is because the keyword #:schema is passed to the underlying jaz/mysql driver as a variable. When it is not quoted, PLT Scheme treat it as the keyword variable for connect.

    HTH.

    ReplyDelete