Tuesday, December 22, 2009

BZLIB/XML.plt - An XML Utility for Xexpr and SXML

BZLIB/XML.plt is now available via PLANET, it provides the following utilities to help with XML manipulation:
  • convert to/from xexpr to sxml
  • reading sxml/xexpr from html or xml sources
  • managing html and xml enities

Installation


(require (planet bzlib/xml)) 

Xexpr and SXML

Although Xexpr is the default xml representation in PLT Scheme (and the web-server), it lacks the toolkits that SXML enjoys. bzlib/xml helps by providing conversion functions to convert between sxml and xexpr:

;; convert from xexpr to sxml 
(xexpr->sxml `(p ((class "default")) "an xexpr instance"))
;; => `(p (@ (class "default")) "an xexpr instance") 

; convert from sxml to xexpr 
(sxml->xexpr `(p (@ (class "default")) "an xexpr instance"))
;; => `(p ((class "default")) "an xexpr instance") 
Converting from xexpr to sxml will allow you to use the facilities such as sxpath, ssax, and sxml-match with xexpr, and converting from sxml to xexpr will allow you to feed sxml into web-server for to generate output based on sxml.

Reading and Writing Xexpr/SXML

bzlib/xml provides read-xexpr and read-sxml to simplify the conversion from html sources to either xexpr or sxml:

Sunday, November 15, 2009

BZLIB/SESSION.plt - a Session Store via BZLIB/DBI.plt

I originally planned to write a series on the development of a session store, but it turned out that there aren't that many things to write about, so I am just going to release the code via planet. As usual, this is released under LGPL.

Installation

To download the planet package:

(require (planet bzlib/session)) 

The package comes with three separate database installation scripts (one each for sqlite, mysql, and postgresql). You can call them via the following:

;; installing to a sqlite database
(require (planet bzlib/session/setup/jsqlite)) 
(setup-session-store/jsqlite! <path-to-sqlite-db>) 

;; installing to a mysql database 
(require (planet bzlib/session/setup/jazmysql)) 
(setup-session-store/jazmysql! <host> <port> <user>  
                               <password> <schema>)

;; installing to a postgresql database 
(require (planet bzlib/session/setup/spgsql)) 
(setup-session-store/spgsql! <host> <port> <user>
                             <password> <database>) 
Known Issue: the script currently can only be run once and it assumes the table session_t does not exist in the target database. This will be rectified in the future.

Session ID

The session store uses uuid for session IDs. bzlib/base provides API for manipulation of uuids.

To create an uuid, just run (make-uuid). It can optionally takes in a parameter that are either an uuid in string, and then create the corresponding uuid structure (it can also takes in another uuid structure and make an equivalent uuid structure).

> (require (planet bzlib/base))
> (make-uuid)
#<uuid:4ba52eac-a0b4-415a-88f5-57d1fadd1aba>
> (make-uuid "4ba52eac-a0b4-415a-88f5-57d1fadd1aba")
#<uuid:4ba52eac-a0b4-415a-88f5-57d1fadd1aba>
> (make-uuid (make-uuid "4ba52eac-a0b4-415a-88f5-57d1fadd1aba"))
#<uuid:4ba52eac-a0b4-415a-88f5-57d1fadd1aba>

Monday, November 9, 2009

Building a Web Session Store (1)

Given that we have previously determined the need for a web session store even if we are using continuations, we'll go ahead and build it on top of our DBI stack, so the session data can be persisted as long as necessary.

Quick Word About Session Store Performance

One thing to note about session data is that its data usage is both read and write intensive, and such data can put strain on the database. It's write-intensive because with each request we'll extend the expiration time on the session itself, and it's read-intensive because the data is needed for every request, but it changes with every request.

For now we'll assume that our database is capable of handling such needs (and it will until you have a sufficiently large amount of traffic), but it's something to keep in mind. The nice thing of building the session logic on top of DBI is that when we need to deal with the performance issue, we can add logics into the DBI tier easily with developing a customer driver, for example, by integrating memcached as a intermediate store that'll flush out the changes to the database once a while instead of with every request.

Active Record

The active record pattern are not just for OOP fanatics - we schemers know that you can craft your own OOP with FP easily. In DBI today there is a base structure for active record definition:

(define active-record (handle id)) 
Such definition is a lot simpler than the usual OOP representations, which usually try to construct the data model in memory, along with dynamically constructed SQL statements. Although such OOP records provide simplicity for the simple cases, it has proven to be a leaky abstraction due to the object vs relational paradigm mismatch, as well as a significant performance overhead. Our simple definition will do us just fine right now.

What would our session API look like then?

;; expiration a julian-day 
;; store is a hash table 
(define-struct (session active-record) (expiration store) #:mutable) 

;; the session key/value manipulation calls... 
(define (session-ref session key (default #f)) ...) 
(define (session-set! session key val) ...) 
(define (session-del! session key) ...) 
;; the persistence calls 
(define (build-session handle ...) ...) 
(define (save-session! session) ...) 
(define (refresh-session! session) ...) 
(define (destroy-session! session) ...) 

Saturday, November 7, 2009

Web Sessions vs. Continuations

The "session info in web server applications" thread recently in plt-scheme list has an undertone that continuations are equivalent of web sessions as understood in other languages and frameworks. This undertone is highlighted by the lack of a session-like capability within the web-server collection that exists in other web frameworks.

This got me to think: are continuations equivalent of sessions?

The original intent (indicated in Shriram's research paper) of web-server's continuation is to correctly and succinctly model interactive web application's application flow. The paper sites examples of incorrectly implemented web apps that would do something like the following:
  1. user browse a list of goods
  2. user opens new window to get the details of goods A
  3. user goes back to original window
  4. user then opens another new window to get the details of goods B
  5. user then goes to goods A and click "Buy Now"
  6. incorrectly implemented app will cause the user to buy goods B instead of goods A
The traditional solution to the above interaction would be to use sessions, and since continuation models such interactions as well, there is no question that in this case continuations supplant the needs of sessions.

But for other scenarios involving sessions it might be more natural to model the computations by using the traditional session concepts.

For example - identifying the user across visits after significant time lapse (this is generally toggled by a "remember me" checkbox during login). Normally web sites accomplish this by persisting the user's authenticators via cookies or sessions.

This process is awkward to model with continuations, since the user likely come back to the site via a top level link that has no captured continuations, instead of digging up the last continuation url for the site, and the continuations might have expired between the visits if you use stateful servlets.

If you use web-server's stateless servlet language, an approach is probably to serialize the continuation into a cookie so it can model the above scenario, but you'll have to write your code in the stateless language or convert your code over, and it feels like a more complex solution compared to simply having a regular session capability. This is similar to using continuations to model non-interactive web links - it can work, but it does not follow Occam's razor.

Furthermore - if your site uses extensive ajax, your use of continuations will decrease, since Ajax models the interactions as well and supplants the needs for continuations. and in such case you might regain the needs for sessions that was reduced by continuations.

So, as far as I can tell, continuations is not equivalent to web sessions and do not eliminate the needs for session capabilities.

Friday, November 6, 2009

Using DBI to Run Scripts & Load Prepared Statement Scripts

There are a couple of utility functions that I have designed into DBI that was not previously discussed. They are both oriented to work with SQL scripts.

I don't know about you, but I like to write SQL statements in SQL scripts:

-- a sql query inside a sql script
select * 
  from table1;
instead of embedding them as strings in programming languages:

;; a sql query embedded in scheme code 
(exec handle "select * from table1")
- it just looks so much nicer.

If you have a ton of complex prepared statements, you'll find you'll have such statements littered everywhere, which makes them difficult to maintain.

Of course - a possible solution to this problem is to create a DSL so the SQL strings can be generated. We might eventually entertain such solution, but it's clear that any such DSL will be non-trivial.

A simpler but equally as powerful of an idea is to move all such queries into SQL scripts, and have the database handle load the scripts and convert them into prepared statements.

Loading Prepared Statement Scripts

You can supply a #:load parameter to the three RDBMS drivers' connect procedure:

;; example - connect to spgsql 
(define handle 
  (connect 'spgsql <regular-args> ... 
           '#:load <path-string? or (listof path-string?)>))

Thursday, November 5, 2009

Latest DBI.plt Available - Handling Last Inserted ID and Side Effects

The newest version of DBI (and the 3 RDBMS drivers) has now been made available on planet - it addresses the issue of side effects and last inserted id.

As usual - they are released under LGPL.

To download & install - use require:

(require (planet bzlib/dbi:1:3) 
         (planet bzlib/dbd-jazmysql:1:2) 
         (planet bzlib/dbd-jsqlite:1:3) 
         (planet bzlib/dbd-spgsql:1:2))
Unifying the side effects and the last inserted id turns out to be a non-trivial task - I have already voiced the options on the plt-scheme list, repeating here for the sake of completeness:
  • the underlying drivers returns different values for side effects
  • there might be legacy code that are utilizing the underlying side effect results - which can increase effort if porting to another database
  • not all drivers provide ways to retrieve all side effect values
  • last inserted id does not work consistently when dealing with the multi-records-insert-at-once scenario - luckily this is generally not how insertion is used
  • for postgresql - we'll need to derive the underlying table to sequence name mapping in order to determine the last inserted id, and this requires a separate query that you might not want to "pay for" unless you have needs for last inserted id
Based on the above design constraints, I have chosen the following:
  • make available multiple drivers for each database - one for different types of the side effect results
  • default the side effect results to an unified effect structure, which is inspired by jazmysql
  • provide last-inserted-id for the 3 RDBMS drivers (SPGSQL has a special requirement, described below)

The Different Side Effects

There are 3 separate side effect types:
  • past-thru-effect - this is the side effect available as a backward compatibility. Basically whatever the side effect objects are returned by the underlying driver are direct returned; and you can use the underlying driver's code to access the values
  • effect - this is the unified side effect object and the default going forward
  • effect-set - this converts the effect structure into a result set

For example, if you want to make a connection with the first side effect type with bzlib/dbd-spgsql, you can do the following:

Tuesday, October 20, 2009

BZLIB/DATE & BZLIB/DATE-TZ 0.2 Now Available

bzlib/date and bzlib/date-tz are now available via planet. They are released under LGPL. You can find the previous documentation for previous usage.

The changes included are:
  • re-exports for SRFI-19 functions
  • Wrapper functions for PLT date objects (you can use the functions with PLT date objects instead of with SRFI date objects)
  • day comparison functions
  • RFC822 date parsers and generators
  • additional date manipulation functions

SRFI19 Re-export
Previous you have to explicitly include srfi/19 to use the functions within SRFI19 as follows:

(require srfi/19 (planet bzlib/date)) 
Now you just have to do the following:

(require (planet bzlib/date/srfi)) 
And almost all srfi/19 functions will be re-exported along with the functions within bzlib/date.

The exceptions are date->string and string->date, neither of which are exported from srfi/19. This is because we may want to use those names for our own date parsers and generator functions. I'll examine the details before deciding whether to re-export those or create our own.

PLT Date Wrappers

You now can use PLT date objects instead of srfi/19 date objects (I do not really know why they are different date objects in the first place...). You can just do the following:

(require (planet bzlib/date/plt)) 

Monday, October 19, 2009

Building Parser Combinators in Scheme (2) - Higher Order Combinators

Previously we started building parser combinators for parsing a symbol that has the following signature (seq alpha (zero-more (one-of alpha numeric))), and we stopped at refactoring the numeric and alpha parsers, and ended up with return, fail, and char-test.

char-test Expanded

char-test gives us a good base for building a bunch of other character-based parser:

;; char= returns a parser to test whether the next char equals c
(define (char= c)
  (char-test (lambda (it)
               (char=? it c)))) 

;; in-chars returns a parser to match against the list of chars
(define (in-chars lst)
  (char-test (lambda (it) 
               (member it lst)))) 

;; not-in-chars is the reverse of in-chars 
(define (not-in-chars lst)
  (char-test (lambda (it) 
               (not (member it lst))))) 

;; in-char-range returns a parser to match char between from & to chars
(define (in-char-range from to)
  (char-test (lambda (it)
               (char<=? from it to))))

;; the oppposite of in-char-range
(define (not-in-char-range from to)
  (char-test (lambda (it)
               (not (char<=? from it to)))))
So we can build parsers such as the following:

;; a parser to test for the backslash
(define backslash (char= #\\)) 
;; writing numeric using in-chars
(define numeric (in-chars '(#\0 #\1 #\2 #\3 #\4 #\5 #\6 #\7 #\8 #\9)))
;; writing equivalent to regexp \S (non-whitespace) 
(define non-whitespace (not-in-chars '(#\tab #\return #\newline #\space #\vtab)))
;; lower-case alpha using in-char-range 
(define lower-case (in-char-range #\a #\z)))
;; upper-case alpha using in-char-range
(define upper-case (in-char-range #\A #\Z))) 
It would be nice to write alpha in terms of upper-case and lower-case as defined above:

(define alpha (one-of upper-case lower-case)) 
So let's see how we can write one-of.

Higher order Parsers

The idea of one-of is straight forward - take in a list of the parsers to test against the input, one parser at a time. The first one that succeeded would be returned:

Friday, October 16, 2009

Building Parser Combinators in Scheme

I do not write code for the sake of writing code, but sometimes the best way to understand a concept is to write it up. I can think of many situations where I have no clue what the heck is going on by reading the documentations or even the tutorials, but I start to understand what's going on once I start to hack up with a solution.

Parser combinator is one of those things for me. Based on what I can find, Haskell was where the term is coined and expanded, but unfortunately I cannot read Haskell well enough to fully follow the excellent monadic parser combinator paper, let along trying to understand Parsec, and think about how it would be translated into scheme.

PLT Scheme contains a parser combinator library, but its documentation is written for someone who is completely familiar with the parser combinator and how it would work within scheme, so I couldn't start with it (in contrast, I was able to figure out parser-tools in short order, even though my previous experience with lex & yacc isn't a lot either).

Searching online, there are a couple of parser combinator written in scheme:
It wasn't much, but it was something that I can build on. Let's get started...

Immediate Obstacle - Reading from Port Instead of String or List

Parser combinator promises top-down parser development in almost BNF-like fashion, including infinitely look ahead and unlimited backtracking, all of which are extremely fascinating powers.

But reading the monadic parser combinator leaves me scratching my head wondering how this would work with ports. The parser combinator appears to derive its backtracking power via reading the string into a list of characters. This is nice and all but it doesn't work with ports, since once a character is read from a port it is gone and you lose the ability to backtrack (yes - with some ports it is possible to backtrack but this does not work with all ports).

I wasn't able to find an answer to this problem via the above URLs either, so I'll have to come up with my own solution.

What I came up with was the following:

Thursday, October 15, 2009

Parsing "Encoded Word" in RFC Headers (3) - Parsing & Decoding

In the previous two posts we have built capabilities to generate encoded words, now it's time to decode them. We'll start with detecting whether a string is an encoded word.

Testing for Encoded Word

Remember that an encoded word has the following format:

=?<charset>?<Q or B>?<encoded data>?=
The above format can be succintly specified via regular expression:

(define encoded-word-regexp #px"=\\?([^\\?]+)\\?(?i:(b|q))\\?([^\\?]+)\\?=")

(define (encoded-word? str)
  (regexp-match encoded-word-regexp str))
Since the format is not recursive, regular expression is good enough, even though it can be considered as ugly. You can certainly try using other approaches, such as a hand written lexer, or using parser-tools to do the job. For now we keep things simple.

Decoding

Once we can test whether a string is an encoded word, we can then use it to handle the decoding:

(define (encoded-word->string str)
  (if-it (encoded-word? str)
         (apply decode-encoded-word (cdr it))
         str))
If the string is an encoded word, we decode it, otherwise we return it verbatim. This way it allows regular string to be passed into this function.

The decode-encoded-word function looks like the following:

Wednesday, October 14, 2009

Parsing "Encoded Word" in RFC Headers (2) - Charset Handling & Multiple Encoded Words

In the previous post we discussed the Q and B encodings, and ended with a bug on mismatching charset if the charset is not utf-8, let's try to fix the bug here.

It would be nice if we can use local charsets such as iso-8559-1 or big5 if we know for sure that the charset contains all of the characters that appears in the string (of course, it is the developer's responsibility to choose the right charset; the code will error out if the charset does not match the data).

PLT Scheme provides a convert-stream to help handle converting bytes from one charset to another. We can build helpers that takes strings or bytes and return string or bytes on top of this function. What we want are something like:

(bytes/charset->string #"this is a string" "ascii") ;; => returns a string
(bytes/charset->bytes/utf-8 <bytes> <charset>) ;; => returns a bytes
The idea is that we'll convert the input data to input-port, and then retrieve the data from the output-port, which will be a bytes port.

So let's start with a helper function that'll take in an input-port, and the charsets and then return a bytes:

(define (port->bytes/charset in charset-in charset-out)
  (call-with-output-bytes 
   (lambda (out)
     (convert-stream charset-in in charset-out out))))
Then we can have the following:

(define (bytes->bytes/charset bytes charset-in charset-out)
  (port->bytes/charset (open-input-bytes bytes) charset-in charset-out))
And we can define converting bytes to and from utf-8:

(define (bytes/charset->bytes/utf-8 bytes charset)
  (bytes->bytes/charset bytes charset "utf-8")) 

(define (bytes/utf-8->bytes/charset bytes charset)
  (bytes->bytes/charset bytes "utf-8" charset))
And finally we can then return strings on top of these two functions:

;; there are more to handle (specifically charsets).
(define (bytes/charset->string bytes charset)
  (bytes->string/utf-8 (bytes/charset->bytes/utf-8 bytes charset)))

(define (string->bytes/charset string charset)
  (bytes/utf-8->bytes/charset (string->bytes/utf-8 string) charset))
With the above functions, we can now ensure to convert the encoded word into the correct charset:

Tuesday, October 13, 2009

Parsing "Encoded Word" in RFC Headers

If you want to correctly handle internet message headers as defined in RFC822 or as improved by RFC2822, you'll find that you currently have no way of handling encoded words, which is defined separately in RFC1342.

Below is the example of encoded words in message headers from RFC1342:

From: =?US-ASCII?Q?Keith_Moore?= <moore@cs.utk.edu>
To: =?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= <keld@dkuug.dk>
CC: =?ISO-8859-1?Q?Andr=E9_?= Pirard <PIRARD@vm1.ulg.ac.be>
Subject: =?ISO-8859-1?B?SWYgeW91IGNhbiByZWFkIHRoaXMgeW8=?=
 =?ISO-8859-2?B?dSB1bmRlcnN0YW5kIHRoZSBleGFtcGxlLg==?=
which should be decoded into

From: Keith Moore <moore@cs.utk.edu>
To: Keld Jørn Simonsen <keld@dkuug.dk>
CC: André Pirard <PIRARD@vm1.ulg.ac.be>
Subject: If you can read this you understand the example.
But currently, net/head cannot handle the encode words and they are not parsed:

(extract-all-fields <the-above-string>)
;; => 
'(("From" . "=?US-ASCII?Q?Keith_Moore?= ")
 ("To" . "=?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= ")
 ("CC" . "=?ISO-8859-1?Q?Andr=E9_?= Pirard ")
 ("Subject"
  .
  "=?ISO-8859-1?B?SWYgeW91IGNhbiByZWFkIHRoaXMgeW8=?=\r\n =?ISO-8859-2?B?dSB1bmRlcnN0YW5kIHRoZSBleGFtcGxlLg==?="))
So we'll need to handle it ourselves. Let's get started.

The Format of an Encoded Word

An encoded word has the following format:

=?<charset>?<Q or B>?<encoded data>?=
And an encoded word should not exceed 75 bytes (including all the delimiters). If the string being encoded cannot fit in the length, then multiple encoded words should be separated by space or linear folding whitespace (\r\n\s*). Encoded words can coexist with plain text in the same header (shown above in the Cc header).

Monday, October 5, 2009

Introducing BZLIB/DATE & BZLIB/DATE-TZ - Date Time Manipulation Libraries

BZLIB/DATE & BZLIB/DATE-TZ are now available on planet. They provide additional date manipulation capability on top of SRFI/19, including timezone manipulation.

They are released under LGPL.

Usage and Installation

(require (planet bzlib/date)) 
(require (planet bzlib/date-tz)) 
bzlib/date provides date manipulations. bzlib/date provides timezone manipulations. Their usages are separately discussed below.

bzlib/date

To create a date object, you can use bulid-date, which provides a more natural year/month/day/hour/minute/second order of the parameters:

(build-date <year> <month> <day> <hour> <minute> <second> #:tz <offset>) 
And it would do the right thing if you enter February 31st:

(build-date 2009 2 31) ;; => #(struct:tm:date 0 0 0 0 3 3 2009 0) 
By default, the tz offset is 0, which equates to GMT (see below for timezone support). Only year, month, and day are required.

Date Comparisons
The following function compares two dates to determine their orders:
  • (date>? <d1> <g2>)
  • (date<? <d1> <g2>)
  • (date>=? <d1> <g2>)
  • (date<=? <d1> <g2>)
  • (day=? <d1> <g2>)
  • (date!=? <d1> <g2>)
  • (date===? <d1> <g2>)
day=? only compares the year/month/day values, and date===? means they are the same date with the same tz offset.

Conversions

You can convert between date and seconds with (date->seconds <date>) and (seconds->date <seconds>).

You can add to a date with (date+ <date> <number-of-days>). The number of day can be a non-integer.

You can find out the gaps between two dates with (date- <date1> <date2>).

You can create an alarm event with date via (date->alarm <date>) or (date->future-alarm <date>). The difference between the two is that date->future-alarm will return false if the date is in the past.

Dealing with Weekdays

To find out the weekday of a particular date, you can use (week-day <date>).

To find out the date of the nth-weekday (e.g., first sunday, 3rd wednesday, last Friday) of a particular month, use nth-week-day:

(nth-week-day <year> <month> <week-day> <nth> <hour> <minute> <second> #:tz <offset>) 
For the week-day argument, use 0 for Sunday, and 6 for Saturday. For the nth argument, use 1, 2, 3, 4, 5, or 'last. hour, minute, second, and offset are optional (same as build-date, and the other functions below that have them).

To find out the date of a particular weekday relative to another date, use one of the following:
  • week-day>=mday
  • week-day<=mday
  • week-day>mday
  • week-day<mday
They all share the same arguments, which are year, month, week-day, month-day, hour, minute, second, and offset.
The usage is something like:

;; the sunday after May 15th, 2009
(week-day>mday 2009 5 0 15) ;; 5/17/2009 
;; the friday before September 22nd, 2008 
(week-day<mday 2009 9 5 22) ;; 9/19/2009 
The hour, minute, second, and offset parameters are there for you to customize the return values:

;; the sunday after May 15th, 2009
(week-day>mday 2009 5 0 15 15 0 0 #:tz -28800) ;; 5/17/2009 15:00:00-28800
;; the friday before September 22nd, 2008 
(week-day<mday 2009 9 5 22 8 30 25 #:tz 14400) ;; 9/19/2009 08:00:00+14400

bzlib/date-tz

By default, you need to parameterize the current-tz parameter, which defaults to America/Los_Angeles. The timezone names are the available names from the olson database.

To determine the offset of any date for a particular timezone, use tz-offset:

(parameterize ((current-tz "America/New_York")) 
  (tz-offset (build-date 2008 3 9))) ;; => -18800 
(parameterize ((current-tz "America/New_York")) 
  (tz-offset (build-date 2008 3 10))) ;; => -14400 
If you want to separate between the standard offset and the daylight saving offset, you can use tz-standard-offset or tz-daylight-saving-offset:

(let ((d1 (build-date 2008 3 9))
        (d2 (build-date 2008 3 10)))
    (parameterize ((current-tz "America/New_York"))
      (values (tz-standard-offset d1)
              (tz-daylight-saving-offset d1)
              (tz-daylight-saving-offset d2))))
;; => -18800 (std)
;; => 0 (dst on 3/9/2008)
;; => 3600 (dst on 3/10/2008) 

Conversion

To reset a date's tz offset, you can use the helper function date->tz, which will reset the offset for you:

(let ((date (build-date 2008 3 10 #:tz -18800))) 
  (parameterize ((current-tz "America/New_York")) 
    (date->tz date))) 
;; => #(struct:tm:date 0 0 0 0 10 3 2008 -14400)
This function is meant for you to fix the offsets for dates that belong to a particular timezone but did not correctly account for the offset - it does not switch the timezone for you.

Couple other functions makes it even easier to work with timezone.

(build-date/tz <year> <month> ...)
(date+/tz <date> <number-of-days>)
They basically creates the date object and calls date->tz so the offset is properly adjusted based on the timezone.

Besides using current-tz, you can also pass it explicitly to tz-offset, tz-standard-offset, tz-daylight-saving-offset, date->tz, build-date/tz, and date+/tz. You pass it in in the following forms:

(tz-offset <date> "America/Los_Angeles")
(tz-daylight-saving-offset <date> "Asia/Kolkata")
(tz-standard-offset <date> "Europe/London")
(date->tz <date> "Europe/London")
(build-date/tz <year> <month> <day> #:tz "America/New_York")
(date+/tz <date> <number-of-days> "America/Los_Angeles")

Convert from One Timezone to Another

To covert the timezone of a date so you get the same date in a different timezone, use tz-convert:

(tz-convert <date> >from-timezone> <to-timezone>)
All parameters are required.

(tz-convert (build-date 2008 3 10 15) "America/New_York" "America/Los_Angeles")
;; => #(struct:tm:date 0 0 0 12 10 3 2008 -25200) ;; 2008/3/10 12:00:00-25200
(tz-convert (build-date 2008 3 10 15) "America/New_York" "GMT")
;; ==> #(struct:tm:date 0 0 0 19 10 3 2008 0) ;; 2008/10/10 19:00:00+00:00

That's it for now - enjoy.

Thursday, October 1, 2009

Handling Time Zone in Scheme (5): Pre-Convert the Rules

This is part five of the timezone series - you can find the previous posts on this subject to get up to speed on the details:
  1. motivation and overview
  2. parsing the zoneinfo database
  3. compute the offsets
  4. convert the rules into dates

We previously have finished the first draft of tz-offset, and it works correctly in majority of the situations. But unfortunately, there is one issue with it:
The code will not work correctly when there are gaps between the years where the rules are applicable.
Fortunately, under regular use of the code, we will not encounter this issue. But the issue definitely exists. Below is the US rule:

  ("US"
   (1918 1919 - 3 (last 0) (2 0 0 w) 3600 "D")
   (1918 1919 - 10 (last 0) (2 0 0 w) 0 "S")
   (1942 1942 - 2 9 (2 0 0 w) 3600 "W")
   (1945 1945 - 8 14 (23 0 0 u) 3600 "P")
   (1945 1945 - 9 30 (2 0 0 w) 0 "S")
   (1967 2006 - 10 (last 0) (2 0 0 w) 0 "S")
   (1967 1973 - 4 (last 0) (2 0 0 w) 3600 "D")
   (1974 1974 - 1 6 (2 0 0 w) 3600 "D")
   (1975 1975 - 2 23 (2 0 0 w) 3600 "D")
   (1976 1986 - 4 (last 0) (2 0 0 w) 3600 "D")
   (1987 2006 - 4 (match 0 >= 1) (2 0 0 w) 3600 "D")
   (2007 +inf.0 - 3 (match 0 >= 8) (2 0 0 w) 3600 "D")
   (2007 +inf.0 - 11 (match 0 >= 1) (2 0 0 w) 0 "S"))
You can see that there is a gap between 1945 and 1967 for applicable rules. You probably are not going to calculate the offsets for 1948 most of the time, but it would be nice if we do not have to face the potential issues.

What we want is to have the last rule of 1945 continue to be applicable until 1967 in this case. And that means we need to be able to skip over multiple years, which our current design does not account for. The nice thing is that the situation is not as dire as it sounds, since most of the timezones that I inspected visually do not make use of the "US" rule during this gap. But it would be nice to know such potential bug will not exist.

As we have found out in the previous posts, inferring the "previous" rules with the data format is difficult, and it's easier to compute the applicable rules for a given year, the easiest solution is to pre-compute the dates for every year that we need to worry about. In this case, any of the gaps will automatically be filled with the previously applicable rules, with the applicable years.

The Applicable Years

It should be obvious that the timezone concept has definite bound toward the past, as GMT was not established until 1675, and US does not adopt the timezone until 1918. The minimum year from the zoneinfo database is 1916:

Wednesday, September 30, 2009

DBI and SQL Escape

Scott Hickey has discovered a bug in DBI:

;; assume you have a table1 with an id and a date field. 
(exec h "insert into table1 values (?id , ?date)" `((id . 1) (date . ,(srfi19:current-date))))
;; => regexp-replace*: expects type  as 2nd argument, given:
;;    #(struct:tm:date 9150000 8 19 0 30 9 2009 -18000); other arguments were: #px"\\'" "''"
This issue is now fixed and the newer version of the DBI and are now available through planet:
This post documents the issue and the resolution of the bug.

This issue is caused by the default SQL escape code that does not know how to handle srfi date objects. The default SQL escape code is quite primitive for the reasons below.

To work around the problem - you can use prepared statements:

(prepare h 'insert-table1 "insert into table1 values (?id , ?date)")
(exec h 'insert-table1 `((id . 1) (date . ,(srfi19:current-date))))
The default preparation code exists as a prepared statement proxy for those database drivers that have no prepared statement capabilities. This is one of the selling points of DBI over other API - the query always allow parameterizations. But that means DBI cannot delegate the task of SQL escaping back to the user.

Because my usage of database has always surround prepared statements, I did not write an extensive SQL escaping library (and hence the bug). Plus, there were technical reasons that prepared statements are superior:
  • SQL escapes does not work uniformly across all databases, especially for types such as blobs and date objects, which each database have their own syntax (and some basically discourage using SQL escapes for blobs)
  • SQL escapes are prone to SQL injections if done poorly. One of my previous gigs was to weed out SQL injection bugs in client code base and while the concept is simple many implementations still got it wrong
  • In general prepared statements will have better performances (but this unfortunately is not always true if the cached query plan results in a miss by the server) for multiple uses
Prepared statements (and stored procedures) are superior to SQL escapes in just about all aspects, including performance and security. There are only three downsides that I am aware of for prepared statements:
  • it might cause the database to hold onto the referred objects so it cannot be dropped - this mainly impacts development environment, since that actually helps your production environment from having tragic accident of dropping tables, views, etc.
  • it might not work well for code that creates dynamic SQL statements that refers to tables with unique prefixes (wordpress and a bunch of PHP code falls into this design style), since there might be thousands of such unique prefixes in a given database. In general, such design really should be discouraged, since databases are designed more for few large tables instead of many small tables
  • it's not all that useful and can potentially be slower for one-call statements, but most of the time this is a non-issue
Anyhow - the reason I am highlighting the merit of prepared statements over SQL escapes is that I believe going toward prepared statements is the way to go, especially for databases that already have them. So I decide to make the database drivers for dbd-spgsql, dbd-jsqlite, and dbd-jazmysql to implicitly create prepared statements if you do not want to explicitly name the query via the prepare call.

So - you can just write:

(exec h "insert into table1 values (?id , ?date)" `((id . 1) (date . ,(srfi19:current-date))))
And it will behave as if you do the following:

;; note - prepare only takes symbol as a key - so you cannot do this manually yourself
(prepare h "insert into table1 values (?id , ?date)" "insert into table1 values (?id , ?date)")
(exec h "insert into table1 values (?id , ?date)" `((id . 1) (date . ,(srfi19:current-date))))
So dbd-spgsql, dbd-jazmysql, dbd-jsqlite will no longer use SQL escape for parameterization going forward. They have been made available via planet.

Thank you Scott for discovering and reporting this issue.

Handling Time Zone in Scheme (4): Rule Conversion

Previously we have discussed timezone handling in a series of posts:
  1. motivation and overview
  2. parsing zoneinfo database
  3. calculating offsets
To continue from the third post where we are in the midst of calculating daylight saving offsets, we have figured out the applicable rules, and we now need to convert them into date structs so we can determine the exact boundary.

Going back to our two rule examples for America/Los_Angeles:

  (2007 +inf.0 - 3 (match 0 >= 8) (2 0 0 w) 3600 "D")
  (2007 +inf.0 - 11 (match 0 >= 1) (2 0 0 w) 0 "S")
We want to convert them into the applicable values (for both 2009 and the previous year - 2008):

2009/3/8 02:00:00-08:00 
2009/11/1 02:00:00-07:00
2008/3/9 02:00:00-08:00 
2008/11/2 02:00:00-07:00 
In order to do so, we'll first have to be able to convert the ON (day of the month) into the correct date value, and then we'll have to convert the AT (time of the date) into the correct time value. Let's get started.

Day of the Month

The simplest ON format is a day number (ranging from 1-31), and for that we do not have to do too much. But there are also two other formats that are based on weekdays:

'(last 0) ;; => last sunday (sunday = 0, monday = 1 ..., saturday = 6) 
'(match 0 >= 5) ;; => sunday on or after 5th 
'(match 2 <= 10) ;; => tuesday on or before 10th 

That means we need to be able to convert them to the appropriate day of the month based on the year and the month.

Doomsday Algorithm and Weekday Calculation

To be able to calculate the weekday-based date values, we first need to be able to calculate the weekday of a particular date. For that we can make use of the doomsday algorithm, which are based on the concept that there is a doomsday every month, and they are easy to remember based on a moniker (4/4, 6/6, 8/8, 10/10, 12/12, ...). The linked explanation makes it sounds more complicated than it actually is - below is the oneline doomsday algorithm in scheme:

(define (doomsday y)
  (modulo (+ 2 (floor (+ y (/ y 4) (- (/ y 100)) (/ y 400)))) 7))
Then with doomsday we can calculate the weekday of a date:

Tuesday, September 29, 2009

Handling Time Zone in Scheme (3): Computing Offsets

This is the third post of the timezone handling series. Please see the previous posts for additional details:
  1. motivation and overview
  2. parsing zoneinfo database
At this point we are able to parse the zoneinfo database files into a giant structure that contains all of the applicable zones and rules. Let's see how to utilize the data.

There are two main purposes for using the time zone information, the first is to figure out the actual offset for a given zone for a particular date:
(tz-offset (build-date 2009 7 1 ...) "America/Los_Angeles") ;; => -25200 
(tz-offset (build-date 2009 12 1 ...) "America/Los_Angeles") ;; => -28800 
And the second use is to convert between different timezones of the same date:

(tz-convert (build-date 2009 7 1 0 0 0) "America/Los_Angeles" "GMT") 
;; => (date 2009 7 1 7 0 0 0 0) 
(tz-convert (build-date 2009 7 1 0 0 0) "America/Los_Angeles" "Asia/Taipei") 
;; => (date 2009 7 1 15 0 0 0 28800)

It should be clear that tz-convert can be built on top of tz-offset, so we first should figure out how tz-offset to compute the offsets.

Basic Idea
The zoneinfo database structure more or less denotes how we should be making use of the library:
  1. look up the zone - find the zone that you are looking for
  2. based on the zone, map to the correct set of rules by testing which rules is applicable (based on the UNTIL field of the zone value)
  3. apply the set of rules against the date to find the proper daylight saving offset

With that in mind, let's first load the parsed structure into something more useable. Let's convert it into a hashtable loading the zones by name, which in turn holds the references to the applicable rules, off of which we can then use to compute actual offsets. Assuming the zoneinfo database has been serialized to a file, then the following would load the data back in:

(define (make-zoneinfo zi)
  (make-links (make-zones (caddr zi) (make-rules (cadr zi)))
              (cadddr zi)))

(make-zoneinfo (call-with-input-file path read))
Basically - we'll first load the rules, and then pass the rules into make-zones, which will then be passed to make-links to complete building the hashtable.

With the following structure definitions for zone and rule, we have:

(define-struct zone (offset rule format until) #:transparent)

(define-struct rule (from to type month date time offset format) #:transparent)

(define (make-rules rules)
  (make-immutable-hash (map (lambda (kv)
                              (cons (car kv)
                                    (map (lambda (rule)
                                           (apply make-rule rule)) 
                                         (cdr kv))))
                            (cdr rules))))

(define (make-zones zones rules)
  (make-immutable-hash (map (lambda (kv)
                              (cons (car kv)
                                    (map (lambda (zone)
                                           (define (helper offset rule format until)
                                             (make-zone offset (hash-ref rules rule #f) format until))
                                           (apply helper zone))
                                         (cdr kv))))
                            (cdr zones))))

And make-link left-folds over the zones and the links list:

Monday, September 28, 2009

Handling Time Zone in Scheme (2): Parsing The Zoneinfo Database

Continuing from the motivation post on the difficulty of handling timezone, we'll now focus on solving the first problem - parsing the zoneinfo database.

As previously discussed, the zoneinfo database format is pretty straight forward:
  • comments starts from # until the end of the line
  • records are line oriented, and there are no line continuation
  • there are 3 main record types:
    • rule - describes daylight saving rules
    • zone - describes different time zones and how the offset (and the applicable daylight saving rules) change over the years
    • link - describes aliases for the different zones

What we want to accomplish is to parse the zoneinfo database and convert into scheme values that are more consumable. Let's try to map them out.

Overall Structure

It would be nice if we can parse the files into structure that looks like:

(zoneinfo (rule ("US" rule1 ...) ...) 
          (zone ("America/New_York" zone-rule1 ...) ...) 
          (link (Link1 ...) ...)) 
The above value would be easy to load back into memory (just a singe read operation). And all of the related rules are all grouped together.

Since the records are line oriented - when we parse them it would look like the following:

(rule "US" ...)
(rule "US" ...) 
...
(zone "America/New_York" ...) 
(zone "America/New_York" ...) 
...
(link ...) 
...
Assuming we are able to generate the above records, we can group them via the group we introduced during the development of the memcached client, by first grouping on rule, zone, and link, and within each group we then further group based on the sub key, and then cons 'zoneinfo in front.

;; 2-layer grouping 
(cons 'zoneinfo 
      (map (lambda (kv)
             (cons (car kv)
                   (group (cdr kv))))
           (group zis)))
Once the transformation is done - we just need to serialize the data out to a file.

(call-with-output-file <target>
  (lambda (out)
     (print zoneinfo out)) 
  #:exists 'replace) 
Then we just need to make sure that we can parse the files (yes - the zoneinfo database consists of multiple files) so we can generate the records to feed into the 2-layer grouping. Since append the results from parsing multiple files is trivial, we'll just discuss the parsing of a single file.

Parsing The Zoneinfo File

PLT Scheme offers a lot of fantastic support for parsing files. We will not go through them all here, but at the high level, there is the parser tools that are basically lex and yacc in scheme, and there is also the parser combinators that you can use to construct your monadic combinator parsers. At a low level, you can use all of the read-* functions to read just about anything you want from an input port.

Friday, September 25, 2009

Handling Time Zone in Scheme: Motivation & Overview

Timezone is a difficult problem, more difficult than it has to be. Probably the biggest challenge is the daylight saving time. It changes with the whims of politicians and governments. Once a while, a city/state will just arbitrarily decide to change their timezone. How should timezone software handle such changes?

In such case, we might naturally assume the answer is to keep all of the same hours but for a different timezone. This answer would have been sufficient when things are not global: what if some of the appointments are with people from other timezone?

Luckily such situation does not arise very often, and neither does the changing of the daylight saving switching dates. But they illustrate the difficulty of timezone handlings.

Insufficient Solution

Most software stores an offset along with a time to denote the timezone offset from GMT, and sometimes even a check to determine whether it is a daylight saving time. In PLT Scheme - the PLT date object has both.

(define-struct date (second minute hour day month year week-day year-day dst? time-zone-offset))
But such solution is brittle in the case of date manipulations, even without the drastic circumstances above. What if you want to calculate the date four month ahead? How do you know whether or not the time should have the same daylight saving offset applied?

A possible solution is to call the C date functions, which can handle the date calculation correctly, pending the TZ environment variable. The drawback of the approach is that if your problem needs to be timezone aware, then you'll be constantly swapping your environment. Besides the fact that such approach will serialize all of your threads, it is also less desirable.

Let's see if we can bring timezone handling into scheme.

Zoneinfo Database

How does C date functions knows how to calculate dates? It consults all of the timezone information with a database called zoneinfo. This database contains all of the past and current timezones and their corresponding offsets. This database is the best authoritative source if you need to handle timezones.

On linux/mac - type man zic, and you will get the details on the format of the zoneinfo files. The zoneinfo files are line oriented, with comment lines starting with #. There are two main constructs we are interested from the files are the zones and the rules:

Thursday, September 24, 2009

Develop a Memcached Client (4) - BZLIB/DBD-MEMCACHED

This is the "dramatic" conclusion of the memcached client development. You can find additional details in the previous posts:
  1. network development overview
  2. memcached client - storage API
  3. memcached client - retrieval and deletion API, and the rest
  4. distributed hash table frontend for memcached

Allright - it's not that dramatic, but I've made the memcached client available as a DBI driver via planet, so you can use it with the DBI interface.

The nicest thing about DBI is the ease for extension. You can - say - create your own driver to wrap around the memcached driver and the postgresql driver, so you do not have to sprinkle your code with calls to both drivers everywhere.

You'll find the tutorial of using bzlib/dbd-memcached after the integration section.

Integrate with DBI

The last stop is to integrate both the single instance and the distributed hashtable instance into DBI, so we can use it with the DBI interface.

As you remember from the create a DBI driver series, we basically have to create the following functions and register them as a driver with DBI:
  • connect - wrap around the creation of the connection
  • disconnect - disconnect the underlying connection
  • prepare - for memcached it's a NOOP
  • query - the majority of work is here
  • begin-trans (optional) - for memcached it's a NOOP
  • commit (optional) - for memcached it's a NOOP
  • rollback (optional) - for memcached it's a NOOP
Below is the code that registers the single instance memcached client - integration of memcached/dht is left as an exercise:

(define (m-connect driver host port)
  (make-handle driver (memcached-connect host port) (make-immutable-hash-registry) 0))

(define (m-disconnect handle)
  (memcached-disconnect (handle-conn handle)))

(define (m-prepare handle stmt)
  (void)) 

(define (make-query set! add! replace! append! prepend! cas! get gets delete! incr! decr! flush-all!)
  (lambda (handle stmt (args '()))
    (let ((client (handle-conn handle)))
      (case stmt 
        ((set! add! replace! append! prepend!)
         (let/assert! ((key (assoc/cdr 'key args))
                       (value (assoc/cdr 'value args))
                       (flags (assoc/cdr 'flags args 0))
                       (exp-time (assoc/cdr 'exp-time args 0)))
                      ((case stmt
                         ((set!) set!)
                         ((add!) add!)
                         ((replace!) replace!)
                         ((append!) append!)
                         ((prepend!) prepend!))
                       client key value 
                       #:exp-time exp-time #:flags flags 
                       #:noreply? (assoc/cdr 'noreply? args))))
        ((cas!)
         (let/assert! ((key (assoc/cdr 'key args))
                       (value (assoc/cdr 'value args))
                       (cas (assoc/cdr 'cas args))
                       (flags (assoc/cdr 'flags args 0))
                       (exp-time (assoc/cdr 'exp-time args 0)))
                      (cas! client key value cas
                                      #:exp-time exp-time #:flags flags #:noreply? (assoc/cdr 'noreply? args))))
        ((get gets)
         (cons (list "key" "value" "flags" "cas")
               (apply (case stmt
                        ((get) get)
                        ((gets) gets))
                      client (map cdr (filter (lambda (kv)
                                                (equal? (car kv) 'key))
                                              args)))))
        ((delete!)
         (let/assert! ((key (assoc/cdr 'key args))
                       (delay (assoc/cdr 'delay args 0)))
                      (delete! client key delay (assoc/cdr 'noreply? args))))
        ((incr! decr!)
         (let/assert! ((key (assoc/cdr 'key args))
                       (value (assoc/cdr 'value args)))
                      ((case stmt
                         ((incr!) incr!)
                         ((decr!) decr!))
                       client key value (assoc/cdr 'noreply? args))))
        ((flush-all!) 
         (let/assert! ((delay (assoc/cdr 'key args 10)))
                      (flush-all! client delay (assoc/cdr 'noreply? args))))
        (else
         (error 'query "invalid stmt: ~a" stmt))))))

Develop a Memcached Client (3) - Distributed Hash Table

This is continuation of the network development in PLT post. You can see the previous posts for more details:
  1. network development overview
  2. memcached client - storage API
  3. memcached client - retrieval and deletion API, and the rest
At this point we have a functional memcached client, but since people generally use memcached like a distributed hash table, what we have is not yet sufficient - we need a frontend to multiple instances of memcached.

The idea is simple - have a structure holding multiple memcached clients:

(define-struct memcached/dht-client (clients)) ;; clients is a vector of memcached clients.
Then we need to ensure that the key gets consistently mapped to the client that holds the data, and ensure the keys are distributed uniformly.

Uniform Hash Distribution

While the concept is pretty straight forward, the algorithm behind hashing is non-trivial. A good string hash function that I found is djb2, with the below as the scheme implementation:

(define (djb2-hash key)
  (define (convert key)
    (cond ((string? key) (convert (string->bytes/utf-8 key)))
          ((bytes? key) (bytes->list key))
          ((symbol? key) (convert (symbol->string key)))))
  (define (helper bytes hash)
    (cond ((null? bytes) hash)
          (else
           (helper (cdr bytes) (+ (* hash 33) (car bytes))))))
  (helper (convert key) 5381))
Once we convert the key into a hash code, we just need to take the remainder against the number of available memcached instances:

;; get the hash code 
(define (memcached/dht-hash client key)
  (remainder (djb2-hash key) (vector-length (memcached/dhs-client-clients client))))
As long as the count and the order of the clients remain the same, we will hash the key to the same client:

;; return the particular client by the key 
(define (memcached/dht-target client key)
  (vector-ref (memcached/dht-client-clients client)
              (memcached/dht-hash client key)))
With them, we can now wrap around all of our API's - we basically maintain the same interface as the single instance API, except that we are passing in the dhs client:

Wednesday, September 23, 2009

Develop a Memcached Client (2) - Filling Out the API

This is the continuation of the network programming series - see the previous installments for more details:

Now we have the ability to store data into memcached, we want to retrieve the stored data.

Generating Requests

Memcached offers two API for such purpose:
  • get - return the data 
  • gets - return the data along with the cas id 
(yes - it's only one character difference... oh well)


Let's take a look at the details of the API:

get <key>*\r\n
gets <key>*\r\n
The <key>* means there can be multiple keys, separated by space. We can represent the generation with the following:

(define (cmd-get out type keys) 
  (define (keys-helper)
    (let ((out (format "~a" keys)))
      (substring out 1 (sub1 (string-length out)))))
  (display-line out "~a ~a" (case type
                              ((get gets) type)
                              (else (error 'cmd-get "unknown get type: ~a" type)))
                (keys-helper)))
The next step is to parse the input.

Parse Response

The response is a bit harder, and it takes the following forms:
  • there can be multiple values returned (one for each key found), and the ending is marked by END\r\n
  • each value starts with a line of VALUE <key> <flags> <bytes-length> [<cas-unique>]\r\n
    • <cas-unique> is a 64-bit integer that uniquely identifies the object; and it is only returned if you use gets instead of get


  • then the value is followed by the data block, terminated by \r\n. The data block should have the same length as indicated by <bytes-length>


The following algorithm will handle the above responses:

Developing a Memcached Client (1) - Storage

[This is a continuation of the network development in PLT post]

Memcached is all the rage these days with the high scalability guys.  Legends have it that as soon as you slap a couple of memcached in between your web app and your database server, all of your scalability issues go away.  Even if that's glossing over of details, memcached certainly offers advantages over accessing the discs all the time, and we want it in PLT.

Let's try to build a memcached client.

Steps to Build a Client (or a Server)

The steps to build either a client or a server is pretty straight forward (independent of the complexity of the protocol):
  1. study the protocol 
  2. write the "output functions" that sends the info to the other party 
  3. write the "input functions" that parses the input from the other party 
  4. combine the input & output functions, and manage the states of the connection (if applicable)
You are likely to mix the steps up instead of doing them in the order prescribed above and that's okay.  The steps are numbered to help us to think of them as an orderly process, but the process is likely more iterative.

Allright - let's get started.

Study the Protocol 

The key to build a network interface is to understand the protocols.  It is best if the protocol is open and published, so you have something written that you can study (hopefully in a language you understand).  The next best option is to reverse engineer from an open source implementation, and the worst is to reverse engineer from sniffing the network interactions between the clients and the server.  I am glad such memcached's protocol is already published so we do not have to go down the other routes.

Please refer to the memcached protocols throughout this process as the official reference.  We'll sprinkle information from the official doc in the posts as necessary.

First a quick overview of the memcached protocol.
  • request/response protocol - there are no unexpected responses that the client has to handle 
  • multiple requests and responses within a single session 
  • *mostly* line-based protocol - the commands are single lines that are terminated by CRLF; the data is also terminated by CRLF, but it is actually determined by length parameter
  • can store, retrieve, and delete the content (as well as flush) 
  • there are also statistics that can be retrieved, but the protocol appears in flux for the stats (so it'll be out of scope for now) 
  • the protocol itself is *stateless* as each of the requests and responses are independent from each other (each request is atomic) 
Overall the protocol is simple yet non-trivial, which gives us a great example for network development.  We'll start with the storage part of the protocol.

Storage API

Memcached has multiple storage commands for different situations:

Tuesday, September 22, 2009

Network Development in PLT: Overview

One of the nice thing about PLT Scheme is that you can write native scheme network apps, rather than having to use FFI to interface with C libraries for the network apps. The nice thing about using native scheme approach is that it works with PLT's threads, unlike FFI, which blocks all PLT threads (and effectively synchronizes the network access). And according to Geoffrey, FFI also have the disadvantage of having to worry about whether it expects 32-bit or 64-bit pointers.

The challenge with network development though is that we seldom have to do it, so it's hard to know all of the ins and outs of network development. This series of posts will provide a tutorial of doing networking development. The goal is to keep the principles as general as possible so it applies toward other programming languages, but obviously this is a scheme blog.

General Concept

We'll quickly go over the basic network architecture concept for the sake of completeness.

In general, networking involves clients and servers interacting with each other by sending information to each other and (if necessary) wait for responses from the other party.

Clients are programs that sends out the requests, and servers are programs that "fulfill" the requests (and possibly send back a response).  A program can be both a client and a server at the same time.

The information they send each other are serialized to bytes, which will be reconstructed by the receivers into meaningful representations that they can interpret.

Depending on the nature of the work, clients and servers might only need to send information to each other once and be done (HTTP is one such protocol), but sometimes they need to communicate back and forth to accomplish the task (SMTP is one such protocol), and such situation it might be necessary for the client and the server to keep track of the state in order to manage the work.

So, at a high level architecture, we need to focus on the following in order to do network development:

  • manage the connection (initiation, termination, etc) 
  • serialize and send information to the other party 
  • receive information from the other party and interpret the meaning (and act accordingly)
  • track and manage the state if the protocol requires it 
The above applies to all network developments.  The specific details and the associated complexity comes down to the specific protocols you are developing.

Let's take a look at what PLT offers to help us handle each of the needs.

Network Connection Management 

By default PLT offers network programming in TCP, UDP, and SSL.  You can require them into your module:




(require scheme/tcp scheme/udp openssl)

We'll focus on just TCP network development in this tutorial since most network protocols will be TCP-based.

Initiate Connections 

If you are developing a client, you can initiate a client connection with:

Monday, September 21, 2009

bzlib/dbd-file - filesystem-based database driver - now available

As previously promised, bzlib/dbd-file, a demonstration of how to extend bzlib/dbi around filesystem, is now available via planet.

You can revisit the posts that chronicled its developments:
  1. Overview of the DBI internals 
  2. a draft driver - list directories and open files
  3. enhance the driver - save files atomically
  4. enhance the driver - delete files atomically 
Prerequisite 


There isn't a prerequisite for this module, but there are a few caveats:
  • No transaction support (only atomicity)
  • Atomicity not guaranteed on Windows (it'll almost work most of the time)
  • No prepared queries 
  • No SQL - this is not a SQL database driver
Installation 

The installation over planet is straight forward:

;; in REPL  
(require (planet bzlib/dbi))
(require (planet bzlib/dbd-file))


Usage 

You should set aside a particular directory as the root for the database.  Let's call that <root-path<.  As with any database - you should not manually access the files within that directory to minimize the chance of destroying the database.

To connect to the database:

;; use the 'file driver  
(define handle (connect 'file <root-path>))  
;; or use the 'file/rs driver 
(define handle (connect 'file/rs <root-path>))

The difference between the two drivers is that 'file/rs returns values that are compatible with the DBI query helper functions such as rows, rows/false, cell/false, since it converts the underlying return value into recordset structure, so it is just a bit less efficient.

To disconnect from the database:

(disconnect handle) ;; NOOP  

The call itself is unnecessary since there are no external resources to clean up.  The prepare is also a NOOP.

To "select" the data from the files by paths:

(query handle 'open  `((path . "/path1") (path . "/path/to/file2") ...)) 
;; => a list of bytes, each bytes is the total content of the file 

 The paths used with the driver (either returned values or passed arguments) are always a jailed absolute path, which will be merged with the root directory to form the actual underlying path.

To "select" the paths a particular base path:

(query handle 'list `()) ;; list the files in the root directory
(query handle 'list `((path . "/some/path"))) ;; list the files in some path 
;; => returns a list of the jailed paths  

To "insert" or "update" a particular path:

(query handle 'save! `((path . "/some/path") (content . #"some bytes")))  

Both the path and content parameters are required in this particular case.  The file will be saved atomically.

To "delete" files or directories:

;; delete files (not directories)
(query handle 'delete! `((path . "/some/path") (path . "/some/other/path") ...))
;; delete empty directories

(query handle 'rmdir! `((path . "/some/path") (path . "/some/other/path") ...))


;; delete either files or directories 
(query handle 'rm-rf! `((path . "/some/path") (path . "/some/other/path") ...))



'delete! will only delete files, and will error if trying to delete directories. 'rmdir! will only delete empty directories, and will error trying to delete either files or non-empty directories.  rm-rf! will delete either files or directories (whether empty or not).  The deletion are atomic on non-Windows platform.

Notes About Windows

Since Windows locks opened filehandles, and generally have background processes such as antivirus softwares that randomly open files for inspections, you might find the save and deletion operations error out intermittently.  You can of course handle the errors and then retry the operations to get around the issues, but basically supporting atomic file-based save/deletion is out of scope.

That's it for now - enjoy.

BZLIB/FLEXER 0.1 - FLEX Integration with SHP - Now Available

bzlib/flexer is now available through planet.

As described in the FLEX/SHP posts, bzlib/flexer provides integration between FLEX and SHP, allowing you to do the following:
  1. directly embed flash video files by calling (flash <url;> ...)
  2. directly code in MXML within SHP and compile into flash video  with (mxml ...)
  3. optimizing the compilation of MXML
The package is released under LGPL. 

Prerequisite

You'll need the Adobe Flex SDK 3 downloaded and installed separately, and make sure that the flex compiler mxmlc is on the system path.

You'll also need bzlib/shp, with the planet version >= 1.2. 

Installation & Configuration


Install via planet with

(require (planet bzlib/shp:1:2))
(require (planet bzlib/flexer))

Make sure to require bzlib/flexer in your SHP required script as well:

Saturday, September 19, 2009

Create a Driver for bzlib/dbi (4) - Concluding Filesystem Driver

This is the fourth installment of the extending bzlib/dbi series - for refreshers, see the following previous posts:
  1. Overview of the DBI internals 
  2. a draft driver - list directories and open files
  3. enhance the driver - save files atomically
Basically we now have the SQL equivalent of select, insert, and update.  The next stop is to provide the ability to delete files and directories.

The Deletion Capability 

PLT Scheme offers the following for deleting files and directories:
  • (delete-file file-path) - delete the path if it is a file; otherwise raise an error
  • (delete-directory dir-path) - delete the path if it is an empty directory; otherwise raises exceptions 
  • (delete-directory/files file-or-dir-path) - delete the path whether it is a file or directory, and if it is an non-empty directory, the sub files and directories will first be deleted recursively 
It seems that delete-directory/files is the most convenient to use.  However, there might be situations you want to ensure that you are deleting either a directory or a file, and if you are deleting a directory you only want to delete an empty directory, and we want to make sure our API reflects that, so we will have 3 separate calls:

;; delete files use delete! 
(query handle 'delete! `((path . path1) ...))
;; delete empty-only directories uses rmdir!
(query handle 'rmdir! `((path . path1) ...))
;; delete either file or directories use rm-rf! 
(query handle 'rm-rf! `((path . path1) ...)) 

Similar to SQL's delete statement, the above query can delete multiple paths at once.  However, given transactions is not supported - the deletes are done independently, and if any of the delete fails the rest will stay undeleted.

The following addition to file-query accomplishes the above:

Friday, September 18, 2009

Create a Driver for bzlib/dbi (3) - Continuing Filesystem Driver

Previously we have discussed the internals of DBI and how to extend it, as well as crafted the first draft of the filesystem database driver.  We are now onto #3 of the DBI extension series.  We'll continue by adding the ability to save data to the driver.

Save The Data

Saving the data is equivalent to both insert and update in SQL venacular.  The simplest version looks like below:

(define (file-query handle stmt (args '())) 
  ... 
  (case stmt
    ... 
    ((save!)
     (call-with-output-file (path-helper (assoc/cdr 'path args))
       (lambda (out) 
         (write-bytes (assoc/cdr 'content args) out))
       #:exists 'replace))
    (else 
     (error 'file-query "unknown statement: ~a" stmt))))

With the above we now can save data into a particular file with the following usage:

(query handle 'save! `((path . "/foo/bar/baz.txt") (content . #"this is the content")))  
But unfortunately, there are a few hiccups:
  • there is no verification that path and content key/value pairs are passed in (for 'list and 'open queries the path key/value pairs are optional) 
  • there is no guarantee the directory of the path exists (and if not it will result in an error) 
  • the above saving is not an atomic operation and can corrupt the data
So we'll address each of the issue to ensure we have a solid implementation for saving data.

Argument Verification 

To verify the arguments, we can use let/assert! from the next version of bzlib/base (which will be released together with this driver) as follows:

(define (file-query handle stmt (args '())) 
  ... 
  (case stmt
    ... 
    ((save!)
     (let/assert! ((path (assoc/cdr 'path args))
                   (content (assoc/cdr 'content args)))
                  (call-with-output-file (path-helper path)
                    (lambda (out) 
                      (write-bytes content out))
                    #:exists 'replace)))
    (else 
     (error 'file-query "unknown statement: ~a" stmt))))

let/assert! checks to see if the values returned are false, and if so raises the error, else binds the variable call the inner expression.  It also behaves like let* instead of let, as the subsequent variable can see the previous variable.

Gaurantee of Directory Path

To ensure the directory for the path already exists - we can utilize make-directory* to create the parent directory for the path, but we need to make sure the parent path of the path is not a file (so we can create a directory):

Thursday, September 17, 2009

Create a Driver for bzlib/dbi (2) - Filesystem Driver

In our previous post - Create a Driver for bzlib/dbi (1) - DBI Internals - we discussed the motivation and DBI internals so we can get started to implement our filesystem-based driver.  If you find some of the following a bit opaque - read the previous post for some explanations.


Caveats about Filesystems and the Driver

Since each filesystem has different behaviors it is difficult to guarantee particular performance characteristics about the drivers. For now the following are the caveats:
  • No transactional support - it takes more effort to build in transactional support for bare filesystems - something we will not tackle for now 
  • No atomic writes for Windows filesystem - as Windows filesystem does not fully support atomic rename (Windows program holds lock on the file if it is opened during the rename and will cause an error), we also cannot make guarantee that writes will be successful in Windows.  Under Unix variants we can guarantee atomic writes 
Furthermore, since our goal is to have a simple driver, the following are also out of scope:
  • SQL statement mappings - for now we will not support SQL statements
  • Prepared statements - since we are not supporting complex SQL mappings, there is no reason to have prepared statement capabilities; i.e. the prepare call will be a no op.
With caveats out of the way - let's determine what we should be able to do at a minimum:
  • manage all files within a single directory as the database data 
  • open files by path and return their contents 
  • listing the files in a particular directory (within the base directory) 
  • save data against a particular path (this can either be an "insert" or "update" operation) atomically on unix platform (might cause errors on Windows given the limitation of Windows platform) 
  • delete a particular file 
  • delete a particular directory 
  • create a particular directory (even without the intermediate directories) 
 Allright - let's get started.

Connect, Disconnect, Prepare, and Transaction Handling 

Since we want to have a directory representing the root of the database, our database connection is really the root directory:

#lang scheme/base 
(require (planet bzlib/base)
         (planet bzlib/dbi)
         )

(define (file-connect driver path) 
  (assert! (directory-exists? path)) ;; assert! comes from bzlib/base 
  (make-handle driver path (make-immutable-hash-registry) 0)) 

Disconnect is even more straight forward, since there isn't any external resources that have to be released:

(define (file-disconnect handle)
  (void)) 

And since prepare is out of scope, it is also a NOOP:

(define (file-prepare handle stmt)
  (void)) 

Furthermore, transaction support is also out of scope - we have more NOOPs:

(define (file-begin handle)
  (void))
(define (file-commit handle)
  (void))
(define (file-rollback handle)
  (void)) 

The default transaction functions will not suffice here since they issue the corresponding SQL statements against the handle.

Assuming we have the corresponding file-query defined we now have a complete driver with:

(registry-set! drivers 'file
               (make-driver file-connect
                            file-disconnect
                            file-query
                            file-prepare
                            file-begin
                            file-commit
                            file-rollback))
    
Now we just need to flesh out file-query, which is the meat of the driver:

Create a Driver for bzlib/dbi (1) - DBI Internals

Since we claimed bzlib/dbi as an extensible abstract database interface for PLT Scheme, let's take a look on how to extend it.  As an exercise we'll create a database driver that uses the filesystem as the database. 

Motivation 

What we want is to use the filesystem as a simple form of database, where the path of the file is the key, and the content is the value.  We should be able to select the data, update, delete, and insert new data.  Let's see what we can come up with.

Overview of Creating an Database Driver

The first step is to require bzlib/base (contains bzlib/base/registry) and bzlib/dbi (contains the interface that we need to extend):

#lang scheme/base
(require (planet bzlib/base) (planet bzlib/dbi)) 

Then we need to create four functions for the driver that matches the following signature:

(connect driver . args) ;; a connect function that takes in the database driver definition, and a variable list of arguments
(disconnect handle) ;; a disconnect function that takes in the database handle
(prepare handle statement) ;; a prepare function that takes in the database handle and a statement
(query handle statement args) ;; a query function that takes in the database handle, the statement (or the key to the prepared statement, and the args as a key/value plist

The above four functions are sufficient for creating a database driver, and if we need to provide custom transaction functions, we will customize three additional functions:

(begin-trans handle) ;; to provide the begin transaction clause for the database handle
(commit handle) ;; to provide the commit transaction clause for the database handle
(rollback handle) ;; to provide the rollback transaction clause for the database handle 

Overriding the transaction functions is an optional step as you can use the default version, which are named default-begin, default-commit, and default-rollback.  All of the default version basically issues the corresponding SQL statements to the underlying database connection.

Then we just need to register the functions as a database driver:

Tuesday, September 15, 2009

Flex Compiler Integration Optimization

This is a continuation of the FLEX Integration Series - please refer to the previous posts for more details:

  1. Building FLEX Integration with SHP
  2. FLEX Compiler Integration with SHP

Now we'll cover how to optimize the integration for performance.

The easiest way to determine whether a new compilation is necessary is by comparing the timestamp of the source code vs the timestamp of the object code - even the venerable make utilize this simple check.

So what we need to do is to determine the timestamp of the shp script that we call mxml, and then compare it against the flash video file.  Since mxmlc compilation time is long, if the timestamp of the shp source script has a later timestamp than the flash video, it is very safe to assume that the source script have been changed.

Determining the Timestamp 

PLT Scheme offers (this-expression-source-directory) and (this-expression-source-file-name) to help determine the location.  But as those are macros and applies to the location of their presence, they are not exactly helpful to our cause unless we want to write them out everytime we have to use mxml like this:


(mxml (this-expression-source-directory) path ...)  
We want the value to be automatically available rather than manually available, so we'll have to improve our SHP handler.


(define __PATH__ (make-parameter (shp-handler-path ($server)))) ;; looks like C macro... 
 (define (evaluate-script path)

  (evaluate-terms (file->values path) path))


(define (evaluate-terms terms path)
  (require-modules! terms) ;; first register the required modules 
  ;; then we filter out the required statement and evaluate the rest of the terms as a proc. 
  (eval `(lambda ,(terms->args terms) 
           (parameterize ((__PATH__ path))
             . ,(terms->exps terms)))
        handler-namespace))
Now we have the path available within the SHP script scope, and we can use it to check the timestamp.

Flex Compiler Integration with SHP

This is a continuation of Building FLEX Integration with SHP - please refer to it for refresher.

The next step of the integration is to do FLEX development directly within SHP. If you are used to develop FLEX apps in IDE such as Flex Builder, you might not necessary need this capability.  But consider the following:
  • potential for modularizing the flex scripts
  • potential for graceful degradations into ajax or pure html 
  • possibility for automating the generation of simple flex-based solutions 
 All of which are powerful building blocks for web development, so we'll give it a shot.  Let's get started.

Goal

We want to be able to write in MXML and ActionScripts in SHP, and have the script being compiled in real-time into the flash movie and served to browser transparently.

Example:

Here's a mxml shp script for hello world:

(mx:app (mx:script "private function clickHandler(evt:Event):void {
    messageDisplay.text = \"I am Glad, it does.\";
}")
     (mx:label "Flex without Flex Builder")
     (mx:button #:label "Yes, It Works!" #:click "clickHandler(event)")
     (mx:label #:id "messageDisplay"))
Which should be compiled into the following mxml:

<mx:Application xmlns:mx="http://www.adobe.com/2006/mxml">
<mx:Script>
<![CDATA[
private function clickHanlder(evt:Event):void {
    messageDisplay.text = "I am Glad, it does.";
}
 ]]>
</mx:Script>
<mx:Label text="Flex without Flex Builder" />
<mx:Button label="Yes, It Works!" click="clickHanlder(event)" />
<mx:Label id="messageDisplay" />
</mx:Application> 
And then compiled into a flash video file, and finally served to the client via the (flash) inclusion.  And we should check the timestamp of the SHP script against the compile flash video so we won't waste CPU cycle by compiling the same script over and over again.

The first thing to do is to ensure we can generate the correct MXML.  It is mostly straight forward:


(define (mx:app . widgets) 
  `(mx:Application ((xmlns:mx "http://www.adobe.com/2006/mxml"))
                   . ,widgets))

(define (mx:label (text "") #:id (id #f))
  `(mx:Label ((text ,text)
              ,@(if (not id) '() `((id ,id))))))

(define (mx:button #:label (label "") #:click (click #f))
  `(mx:Button ((label ,label) 
               ,@(if (not click) '() `((click ,click))))))




;; more mxml widget definitions  
With the above we now can generate the MXML, albeit in an incomplete form.  The next step would then be to get the generated mxml *saved* to a predefined location, instead of serving them.