bzlib/parseq
]Previously we have looked at the basic parsers that peeked into inputs, now it's time to look at the combinators.
Basic Parser Combinators
bind
is the "bind operator" for the parsers, with the following signature:
(-> Parser/c (-> any/c Parser/c) Parser/c)
It takes in a parser, a "transform" function that will consume the returned value from the first parser and transform the value into another parser, and combine both into another parser. As there are other more specialized combinators, you should not need to use
bind
directly unless you are trying to compose functions during run-time.result
simplifies the transform function you need to write so your transform function only need to return the value, not another parser. Below is the definition of result
:
(define (result parser helper)
(bind parser
(lambda (v)
(if v
(return (helper v))
fail))))
result*
works like result
, but it only works with a return value that is a list, and it applies the transform against the value. This comes in handy when you know the parser returns a list and you want to bind the list to individual arguments in your transform function. Below is an example (sequence*
combinator is explained later):
(result* (sequence* (char= #\b) (char= #\a) (char= #\r))
(lambda (b a r) ;; b maps to #\b, a maps to #\a, and r maps to #\r
(list->string (list b a r))))
Multi-Parser Combinatorsseq
is a macro-based combinator that bounds multiple parsers in succession, and returns the results only if all parsers succeed. The design of this parser is inspired by Shaurz's Haskell-style parser combinator.The default form of
seq
is (seq <parser>)
, which is equivalent of <parser>
. An seq
expression must end in the first form. The second form of
seq
allows you to create lexical binding against the value of the intermediate parser so you can manipulate the result with the last parser. An example to parse 2 separate digit and sum by with the first being multiplied by 2 and the second multiplied by 4:
(seq a <- (char= #\0 #\9)
b <- (char= #\0 #\9)
(return (+ (* 2 (string->number (string a)))
(* 4 (string->number (string b))))))
This form, however, cannot be the last statement in seq
, i.e., you must have other parser statements after a binding (otherwise why do you need the binding?). The last form of
seq
behaves like the second form but does not create a binding, so it looks like the first form. An example is to parse away the parentheses surrounding a digit:
(seq (char= #\() ;; 3rd form - no lexical binding
digit <- (char= #\0 #\9) ;; 2nd form - binds digit
(char= #\)) ;; 3rd form - no lexical binding
(return (string->number (string digit))) ;; 1st form
)
seq
is perhaps the most versatile combinator that you will use to create other combinators and parsers. sequence
is the functional version of seq
that takes a list of parsers as its argument:
(sequence (list parser ...))
sequence
has the following limitations: sequence
do not allow for custom bindingssequence
cannot be used to define recursive parser (its inner parser must be defined first)sequence
do not handle custom transformation of the results - you should use it withresult*
to bind the transformation (see above).
sequence*
is the variable args version of sequence
, i.e. (sequence* parser ...)
. choice
is a macro combinator that combines mutliple parsers so the first one that succeeds will return the value:
;; returns if the next char is either #\a, #\b, or #\c. Else fail.
(choice (char= #\a) (char= #\b) (char= #\c))
one-of
is the functional version of choice
that takes a list of parsers as its argument:
(one-of (list parser ...))
Since this is an function combinator, unlike choice
, you cannot define recursive parser with it.one-of*
is the variable arg version of one-of
, i.e., (one-of* parser ...)
.all-of
requires all of the inner parsers to succeed, and it returns the result from the last parser:
;; an contrived example: parse successfully if the next char is #\e
(all-of (list alpha (char-between #\d #\f) (char= #\e)))
Remember it is the last result being returned, so if you inner parsers requires different amount of bytes matched, make sure the one that peek the most bytes gets matched last. all-of*
is the variable arg list version of all-of
.Repetition-Based Combinators
repeat
is the most flexible repetition-based combinators:
(repeat <parser> <minimum> <maximum>)
The minimum
argument defaults to 1, and the maximum
argument defaults to positive infinity, which corresponds to the one-many
combinator:
(define (one-many parser)
(repeat parser 1 +inf.0))
And zero-many
maps to:
(define (zero-many parser)
(repeat parser 0 +inf.0))
Note that all three parsers will return a list that contains each of the successful parses. zero-many
will return a list even if the inner parser does not match, so zero-many
will always successfully parse. zero-one
is a special repetition-based combinator in that it does not return a list. Instead, it allows you to parse for a single occurence, and if it fails, you can specify a default value to substitute for the failure:
;; example of parsing for "foo" but returns "bar" if the parse failed.
(zero-one (string= "foo") "bar")
Hence zero-one
will always succeed as well. That's it for the combinators. We'll discuss some pre-defined parsers in the next post. Stay tuned.
No comments:
Post a Comment