If you need a refresher on parser combinators, you can read my previous posts for a quick tour.
(require (planet bzlib/parseq))
The package includes the following examples that you can inspect for the usage of the API:
Parser Type Signature & Input
(require (planet bzlib/parseq/example/csv) ;; a customizable csv parser (planet bzlib/parseq/example/calc) ;; a simple calculator with parens (planet bzlib/parseq/example/regex) ;; a simplified regular expression parser & evaluator (planet bzlib/parseq/example/sql) ;; parsing SQL's create table statement (planet bzlib/parseq/example/json) ;; parses JSON data format )
A parser is a function that has the following signature:
If the value is
(-> Input/c (values any/c Input/c)) ;; returns the value and the next input
#fthen the parse has failed. This might be changed in the future to another value so you can return #f as a parsed value.
inputis a struct with the following structure:
It is an abstraction over an
(define-struct input (source pos) #:prefab)
input-portso you can keep track of the current position on the port. The function
make-inputwill take in an input-port, a string, or a byte and return an
inputstruct with the position initiated to 0.
During the parsing, the values are peeked instead of read so we can backtrack to the beginning. That means when you finished parsing, all of the data are still in the port.
make-readerwraps over a parser so you can just pass an
input-portinstead of needing to create an
inputstruct, and it also consumed the bytes if the parse is successful:
Fundamental Parsers (input not peeked)
(make-reader parser) ;; => (-> input-port? any/c)
The following parsers do not peek into the input.
<v>that you specify.
failedstruct that includes the position of the parse when the parser fails. The
failedstruct is currently defined as follows:
(define-struct failed (pos) #:prefab)
failed?tests whether the returned value is a
(compose not failed?)).
SOF(start of file) will return
'sofif it is the start of the input (i.e., position = 0).
Fundamental Parsers (input peeked)
itempeeks the input, test the input to ensure it satisfies a criteria, and if so, returns the value and advance the port by the size of the peeked value:
(item <peek> <isa?> <satisfy?> <size>) peek => (-> Input/c any/c) isa? => (-> any/c boolean?) satisfy? => (-> any/c any) size => (-> any/c exact-integer?)
isa?tests for the return value's type so you can simplify the writing of
satisfy?, which can assume the value is of the right type.
itemonly when you need to create new parsers that the library do not already provide.
Non-Character Type Parsers
bzlib/parseqallows non-character parsers so you can use it to parse binary data instead of just text streams. You can mix them together of course.
(bytes= <bytes>)returns when the next set of bytes equals the passed in bytes. For example:
> ((bytes= #"foo") (make-input "foo bar")) #"foo" #s(input #
(string= <string>)returns when the next set of bytes (in string) equals the passed in string.
(string-ci= <string>)is the case-insensitive version of
(byte= <byte>)returns when the next byte equals the passed in byte.
(bits= <bits>)returns the next byte when it equals the passed in bits (a list of 8 0's or 1's).
bits=are built on top of
byte-when, which is built on top of
byte-whenhas the following signature:
(byte-when <satisfy?> (<isa?> byte?) (<size> (the-number 1))) ;; (the-number <n>) returns a lambda that returns <n>
EOFis also built on top of
(byte-when identity eof-object? (the-number 0)).
The counterpart to
byte-whenfor character-based parsers is
char-when, which has the following signature:
(item <satisfy?> (<isa?> char?) char-utf-8-length)
The following are built on top of
<c>when the next character equals
(char-not= <c>)is the opposite of
(char-ci=? <c>)is the ci (case-insensitive) version of
(char-not-ci=? <c>)is the opposite of
(char-between <lc> <hc>)returns the next char when it falls between
(char-not-between <lc> <hc>)is the opposite of
(char-ci-between <lc> <hc>)is the ci version of
(char-ci-not-between <lc> <hc>)is the opposite of
(char-in <chars>)returns the next char when it is one of the characters in
(char-not-in <chars>)is the opposite of
(char-ci-in <chars>)is the ci version of
(char-ci-not-in <chars>)is the opposite of
literalis used to abstract the parsers that basically performs an equal comparison (e.g.,
byte=, etc), as well as allowing an inner parser to pass through, so you do not have to explicitly choose between
string=, etc., based on the argument. Example:
(literal #\a) ;; => (char= #\a) (literal "abc") ;; => (string= "abc") (literal any-byte) => any-byte
literal-ciis the case-insensitive version of
literal, the difference is that it will return the case-insensitive parser for character and string:
(literal-ci #\a) ;; => (char-ci= #\a) (literal-ci "abc") ;; => (string-ci= "abc") (literal-ci #"abc") ;; => (literal #"abc")
That about sums it up for the basic parsers. The next post will document the combinators. Stay tuned.