A huge class of parsing involves tokenizing the streams by skipping over whitespaces. For example, if we want to parse for a list of 3 integer, separated by comma, we currently have to write:
The code above looks messy, and it would be nice if we do not have to explicitly specify the parsing of whitespaces.
(define three-int-by-comma (seq whitespaces i1 <- integer whitespaces #\, whitespaces i2 <- integer whitespaces #\, whitespaces i3 <- integer (return (list i1 i2 i3))))
tokenallows us to abstract away the parsing of whitespaces:
The above code can now be rewritten as:
(define (token parser (delim whitespaces)) (seq delim t <- parser (return t)))
Which looks a lot better. But given tokenizing is such a common parsing task, we have a shorthand for the above called
(define three-int-by-comma2 (seq i1 <- (token integer) (token #\,) i2 <- (token integer) (token #\,) i3 <- (token integer) (return (list i1 i2 i3))))
Which will reduce the above parsing to the following:
(define-macro (tokens . exps) (define (body exps) (match exps ((list exp) (list exp)) ((list-rest v '<- exp rest) `(,v <- (token ,exp) . ,(body rest))) ((list-rest exp rest) `((token ,exp) . ,(body rest))))) `(seq . ,(body exps)))
There is a case insensitive version of
(define three-int-by-comma3 (tokens i1 <- integer #\, i2 <- integer #\, i3 <- integer (return (list i1 i2 i3))))
tokens-cithat allows the character and string token to be parsed in case insensitive fashion.
Besides tokenizing, another common need in token-based parsing is to handle delimited sets. In the above example, the 3 integers are delimited by commas.
delimitedgeneralize the pattern:
The following parses a list of comma-delimited integers:
(define (delimited parser delim) (tokens v <- parser v2 <- (zero-many (tokens v3 <- delim v4 <- parser (return v4))) (return (cons v v2))))
Another common pattern is to parse for brackets that surrounds the value that you need. Just about all programming languages have such constructs. And
(delimited integer #\,)
brackethandles such parses:
(define (bracket open parser close) (tokens open v <- parser close (return v)))
bracket/delimitedcombines the case where you need to parse a bracketed delimited values:
(define (bracket/delimited open parser delim close) (tokens open ;; even the parser is optional... v <- (zero-one (delimited parser delim) '()) close (return v)))
That's it for the
bzlib/parseqAPI. If you find anything missing, please let me know.