- convert to/from xexpr to sxml
- reading sxml/xexpr from html or xml sources
- managing html and xml enities
(require (planet bzlib/xml))
Xexpr and SXML
Although Xexpr is the default xml representation in PLT Scheme (and the web-server), it lacks the toolkits that SXML enjoys. bzlib/xml helps by providing conversion functions to convert between sxml and xexpr:
Converting from xexpr to sxml will allow you to use the facilities such as sxpath, ssax, and sxml-match with xexpr, and converting from sxml to xexpr will allow you to feed sxml into web-server for to generate output based on sxml.
;; convert from xexpr to sxml (xexpr->sxml `(p ((class "default")) "an xexpr instance")) ;; => `(p (@ (class "default")) "an xexpr instance") ; convert from sxml to xexpr (sxml->xexpr `(p (@ (class "default")) "an xexpr instance")) ;; => `(p ((class "default")) "an xexpr instance")
Reading and Writing Xexpr/SXML
read-sxmlto simplify the conversion from html sources to either xexpr or sxml:
;; reading xexpr (read-xexpr <input-port?>) ;; reading sxml (read-sxml <input-port?>)
<input-port?>can be an
http-client-responsestructure defined in
bzlib/http, which provides an content-type header that helps aid the determination of whether this is an html or xml document, for example:
There are corresponding
(read-sxml (http-get "http://www.google.com/"))
;; write-xexpr (write-xexpr <xexpr?> <output-port?>) ;; write-sxml (write-sxml <sxml?> <output-port?>)
As part of converting from xexpr to sxml you'll need to deal with normalizing the xml entities. Since xexpr simply converts entities into symbols and numeric entities into numbers instead of converting them into final strings, bzlib/xml provides a
entity->stringroutine that'll convert the entity into strings for you.
This is automatically called by
(entity->string <symbol or number entity>)
read-sxml, so you generally do not have to use it explicitly, except to extend the entity mapping.
entity->stringconverts entities by mapping numeric entities via against the unicode character map, and symbol entities via two separate entity mapping tables, one for predefined HTML entities, and the other is a parameterizable XML entities.
The HTML entity table contains a set of pre-defined HTML entities that were mapped to the underlying character numeric code. Generally you should not have to modify this set of entities, but if you need to, you can do so via
(set-html-entities! <list of symbol/integer pairs>) ;; example (set-html-entities! `((nbsp . 160) (lt . 60) (gt . 62)))
The XML entity table (
xml-entities) is parameterizable, and it takes a list of symbol and string pairs:
If you use the same symbol entity in both tables, the
(parameterize ((xml-entities '((lt . "<") (gt . ">")))) (read-sxml ...))
Any unknown entity is mapped to the null character.
That's it for now. Enjoy.