Previously we have looked at
fundamental parsers and
the combinators API, now it is time to look at some common parsers provided by bzlib/parseq.
In this case, since we are constructing these parsers on top of the fundamental parsers and combinators, we will show the definitions accordingly.
Character Category Parsers
digit
is a character between
#\0
and
#\9
.
(define digit (char-between #\0 #\9))
not-digit
is a character not between
#\0
and
#\9
.
(define not-digit (char-not-between #\0 #\9))
lower-case
is a character beween
#\a
and
#\z
.
(define lower-case (char-between #\a #\z))
upper-case
is a character between
#\A
and
#\Z
.
(define upper-case (char-between #\A #\Z))
alpha
is either an
lower-case
or
upper-case
character.
(define alpha (choice lower-case upper-case))
alphanumeric
is either an
alpha
character or a
digit
character.
(define alphanumeric (choice alpha digit))
whitespace
is either a space, return, newline, tab, or vertical tab.
(define whitespace (char-in '(#\space #\return #\newline #\tab #\vtab)))
not-whitespace
is a character that is not a whitespace.
(define not-whitespace (char-not-in '(#\space #\return #\newline #\tab #\vtab)))
whitespaces
parses for zero or more whitespace characters:
(define whitespaces (zero-many whitespace))
ascii
is a charater bewteen 0 to 127:
(define ascii (char-between (integer->char 0) (integer->char 127)))
word
is either an alphanumeric or an underscore:
(define word (choice alphanumeric (char= #\_)))
not-word
is a character that is not a word:
(define not-word (char-when (lambda (c)
(not (or (char<=? #\a c #\z)
(char<=? #\A c #\Z)
(char<=? #\0 c #\9)
(char=? c #\_))))))
Finally,
newline
parses for either CR, LF, or CRLF:
(define newline
(choice (seq r <- (char= #\return)
n <- (char= #\newline)
(return (list r n)))
(char= #\return)
(char= #\newline)))
Number Parsers
sign
parses for either + or -, and defaults to +.
(define sign (zero-one (char= #\-) #\+))
natural
parses for 1+ digits:
(define natural (one-many digit))
decimal
parses for a number with decimal points:
(define decimal (seq number <- (zero-many digit)
point <- (char= #\.)
decimals <- natural
(return (append number (cons point decimals)))))
positive
parses for either natural or decimal. Note decimal needs to be placed first since natural will succeed when parsing a decimal:
(define positive (choice decimal natural))
The above parsers returns the characters that represents the positive numbers. To get it to return numbers, as well as parsing for both positive and negative numbers, we have a couple of helpers:
;; make-signed will parse for the sign and the number.
(define (make-signed parser)
(seq +/- <- sign
number <- parser
(return (cons +/- number))))
;; make-number will convert the parsed digits into number.
(define (make-number parser)
(seq n <- parser
(return (string->number (list->string n)))))
Then
natural-number
parses and returns a natural number:
(define natural-number (make-number natural))
integer
will parse and returns an integer (signed):
(define integer (make-number (make-signed natural)))
positive-number
will parse and return a positive number (integer or real):
(define positive-number (make-number positive))
real-number
will parse and return a signed number, integer or real:
(define positive-number (make-number positive))
String Parsers
The following parsers parses for quoted string and returns the inner content as a string.
escaped-char
parses for characters that were part of an escaped sequence. This exists for characters such as
\n
(which should return a
#\newline
), and character such as
\"
(which should return just
"
):
(define (escaped-char escape char (as #f))
(seq (char= escape)
c <- (if (char? char) (char= char) char)
(return (if as as c))))
;; e-newline
(define e-newline (escaped-char #\\ #\n #\newline))
;; e-return
(define e-return (escaped-char #\\ #\r #\return))
;; e-tab
(define e-tab (escaped-char #\\ #\t #\tab))
;; e-backslash
(define e-backslash (escaped-char #\\ #\\))
quoted
parses for the quoted string pattern (including escapes):
;; quoted
;; a specific string-based bracket parser
(define (quoted open close escape)
(seq (char= open)
atoms <- (zero-many (choice e-newline
e-return
e-tab
e-backslash
(escaped-char escape close)
(char-not-in (list close #\\))))
(char= close)
(return atoms)))
make-quoted-string
abstracts the use of
quoted
.
(define (make-quoted-string open (close #f) (escape #\\))
(seq v <- (quoted open (if close close open) escape)
(return (list->string v))))
Then
single-quoted-string
and
double-quoted-string
look like the following:
(define single-quoted-string (make-quoted-string #\'))
(define double-quoted-string (make-quoted-string #\"))
Finally,
quoted-string
will parse both
single-quoted-string
and
double-quoted-string
:
(define quoted-string
(choice single-quoted-string double-quoted-string))
That is it for now - we will talk about parsing tokens next. Enjoy.