If you need a refresher on parser combinators, you can read my previous posts for a quick tour.
Installation
(require (planet bzlib/parseq))
The package includes the following examples that you can inspect for the usage of the API:
(require (planet bzlib/parseq/example/csv) ;; a customizable csv parser
(planet bzlib/parseq/example/calc) ;; a simple calculator with parens
(planet bzlib/parseq/example/regex) ;; a simplified regular expression parser & evaluator
(planet bzlib/parseq/example/sql) ;; parsing SQL's create table statement
(planet bzlib/parseq/example/json) ;; parses JSON data format
)
Parser Type Signature & InputA parser is a function that has the following signature:
(-> Input/c (values any/c Input/c)) ;; returns the value and the next input
If the value is #f
then the parse has failed. This might be changed in the future to another value so you can return #f as a parsed value. The
input
is a struct with the following structure:
(define-struct input (source pos) #:prefab)
It is an abstraction over an input-port
so you can keep track of the current position on the port. The function make-input
will take in an input-port, a string, or a byte and return an input
struct with the position initiated to 0. During the parsing, the values are peeked instead of read so we can backtrack to the beginning. That means when you finished parsing, all of the data are still in the port.
make-reader
wraps over a parser so you can just pass an input-port
instead of needing to create an input
struct, and it also consumed the bytes if the parse is successful:
(make-reader parser) ;; => (-> input-port? any/c)
Fundamental Parsers (input not peeked)The following parsers do not peek into the input.
(return <v>)
returns <v>
that you specify. fail
returns a failed
struct that includes the position of the parse when the parser fails. The failed
struct is currently defined as follows:
(define-struct failed (pos) #:prefab)
succeeded?
and failed?
tests whether the returned value is a failed
struct (succeeded?
equals (compose not failed?)
). SOF
(start of file) will return 'sof
if it is the start of the input (i.e., position = 0). Fundamental Parsers (input peeked)
item
peeks the input, test the input to ensure it satisfies a criteria, and if so, returns the value and advance the port by the size of the peeked value:
(item <peek> <isa?> <satisfy?> <size>)
peek => (-> Input/c any/c)
isa? => (-> any/c boolean?)
satisfy? => (-> any/c any)
size => (-> any/c exact-integer?)
isa?
tests for the return value's type so you can simplify the writing of satisfy?
, which can assume the value is of the right type. You use
item
only when you need to create new parsers that the library do not already provide. Non-Character Type Parsers
bzlib/parseq
allows non-character parsers so you can use it to parse binary data instead of just text streams. You can mix them together of course. (bytes= <bytes>)
returns when the next set of bytes equals the passed in bytes. For example:
> ((bytes= #"foo") (make-input "foo bar"))
#"foo"
#s(input # 3)
(string= <string>)
returns when the next set of bytes (in string) equals the passed in string.(string-ci= <string>)
is the case-insensitive version of string=
.(byte= <byte>)
returns when the next byte equals the passed in byte. (bits= <bits>)
returns the next byte when it equals the passed in bits (a list of 8 0's or 1's). byte=
and bits=
are built on top of byte-when
, which is built on top of item
. byte-when
has the following signature:
(byte-when <satisfy?> (<isa?> byte?) (<size> (the-number 1)))
;; (the-number <n>) returns a lambda that returns <n>
EOF
is also built on top of byte-when
as (byte-when identity eof-object? (the-number 0))
. Character-Based Parsers
The counterpart to
byte-when
for character-based parsers is char-when
, which has the following signature:
(item <satisfy?> (<isa?> char?) char-utf-8-length)
The following are built on top of
char-when
: (char= <c>)
returns <c>
when the next character equals <c>
. (char-not= <c>)
is the opposite of char=
.(char-ci=? <c>)
is the ci (case-insensitive) version of char=
.(char-not-ci=? <c>)
is the opposite of char-ci=?
.(char-between <lc> <hc>)
returns the next char when it falls between <lc>
and <hc>
.(char-not-between <lc> <hc>)
is the opposite of char-between
.(char-ci-between <lc> <hc>)
is the ci version of char-between
.(char-ci-not-between <lc> <hc>)
is the opposite of char-ci-between
.(char-in <chars>)
returns the next char when it is one of the characters in <chars>
.(char-not-in <chars>)
is the opposite of char-in
.(char-ci-in <chars>)
is the ci version of char-in
.(char-ci-not-in <chars>)
is the opposite of char-ci-in
.Literal Parsers
literal
is used to abstract the parsers that basically performs an equal comparison (e.g., string=
, byte=
, etc), as well as allowing an inner parser to pass through, so you do not have to explicitly choose between char=
, string=
, etc., based on the argument. Example:
(literal #\a) ;; => (char= #\a)
(literal "abc") ;; => (string= "abc")
(literal any-byte) => any-byte
literal-ci
is the case-insensitive version of literal
, the difference is that it will return the case-insensitive parser for character and string:
(literal-ci #\a) ;; => (char-ci= #\a)
(literal-ci "abc") ;; => (string-ci= "abc")
(literal-ci #"abc") ;; => (literal #"abc")
That about sums it up for the basic parsers. The next post will document the combinators. Stay tuned.
Dear all
ReplyDeleteI am a student at AL-QUDS University in Palestine, I need your help
as possible .
Please help me by showing me how can I use Scheme or by DrRacket software to
build the parser for this Grammar:
expression -> number expression_tail
expression_tail -> + expression expression_tail
expression_tail -> Ø
Example on valid expression: "5 + 4 + 3 + 6 +7”