- convert to/from xexpr to sxml
- reading sxml/xexpr from html or xml sources
- managing html and xml enities
Installation
(require (planet bzlib/xml))
Xexpr and SXML
Although Xexpr is the default xml representation in PLT Scheme (and the web-server), it lacks the toolkits that SXML enjoys. bzlib/xml helps by providing conversion functions to convert between sxml and xexpr:
;; convert from xexpr to sxml
(xexpr->sxml `(p ((class "default")) "an xexpr instance"))
;; => `(p (@ (class "default")) "an xexpr instance")
; convert from sxml to xexpr
(sxml->xexpr `(p (@ (class "default")) "an xexpr instance"))
;; => `(p ((class "default")) "an xexpr instance")
Converting from xexpr to sxml will allow you to use the facilities such as sxpath, ssax, and sxml-match with xexpr, and converting from sxml to xexpr will allow you to feed sxml into web-server for to generate output based on sxml.Reading and Writing Xexpr/SXML
bzlib/xml
provides read-xexpr
and read-sxml
to simplify the conversion from html sources to either xexpr or sxml:
;; reading xexpr
(read-xexpr <input-port?>)
;; reading sxml
(read-sxml <input-port?>)
The <input-port?>
can be an http-client-response
structure defined in bzlib/http
, which provides an content-type header that helps aid the determination of whether this is an html or xml document, for example:
(read-sxml (http-get "http://www.google.com/"))
There are corresponding write-sxml
and write-xexpr
functions:
;; write-xexpr
(write-xexpr <xexpr?> <output-port?>)
;; write-sxml
(write-sxml <sxml?> <output-port?>)
Managing Entities
As part of converting from xexpr to sxml you'll need to deal with normalizing the xml entities. Since xexpr simply converts entities into symbols and numeric entities into numbers instead of converting them into final strings, bzlib/xml provides a
entity->string
routine that'll convert the entity into strings for you.
(entity->string <symbol or number entity>)
This is automatically called by xexpr->sxml
, read-xepxr
, and read-sxml
, so you generally do not have to use it explicitly, except to extend the entity mapping.entity->string
converts entities by mapping numeric entities via against the unicode character map, and symbol entities via two separate entity mapping tables, one for predefined HTML entities, and the other is a parameterizable XML entities.The HTML entity table contains a set of pre-defined HTML entities that were mapped to the underlying character numeric code. Generally you should not have to modify this set of entities, but if you need to, you can do so via
set-html-entities!
:
(set-html-entities! <list of symbol/integer pairs>)
;; example
(set-html-entities! `((nbsp . 160) (lt . 60) (gt . 62)))
The XML entity table (
xml-entities
) is parameterizable, and it takes a list of symbol and string pairs:
(parameterize ((xml-entities '((lt . "<") (gt . ">"))))
(read-sxml ...))
If you use the same symbol entity in both tables, the xml-entities
takes precedence. Any unknown entity is mapped to the null character.
That's it for now. Enjoy.
No comments:
Post a Comment