Tuesday, December 22, 2009

BZLIB/XML.plt - An XML Utility for Xexpr and SXML

BZLIB/XML.plt is now available via PLANET, it provides the following utilities to help with XML manipulation:
  • convert to/from xexpr to sxml
  • reading sxml/xexpr from html or xml sources
  • managing html and xml enities

Installation


(require (planet bzlib/xml)) 

Xexpr and SXML

Although Xexpr is the default xml representation in PLT Scheme (and the web-server), it lacks the toolkits that SXML enjoys. bzlib/xml helps by providing conversion functions to convert between sxml and xexpr:

;; convert from xexpr to sxml 
(xexpr->sxml `(p ((class "default")) "an xexpr instance"))
;; => `(p (@ (class "default")) "an xexpr instance") 

; convert from sxml to xexpr 
(sxml->xexpr `(p (@ (class "default")) "an xexpr instance"))
;; => `(p ((class "default")) "an xexpr instance") 
Converting from xexpr to sxml will allow you to use the facilities such as sxpath, ssax, and sxml-match with xexpr, and converting from sxml to xexpr will allow you to feed sxml into web-server for to generate output based on sxml.

Reading and Writing Xexpr/SXML

bzlib/xml provides read-xexpr and read-sxml to simplify the conversion from html sources to either xexpr or sxml:


;; reading xexpr 
(read-xexpr <input-port?>) 

;; reading sxml
(read-sxml <input-port?>) 
The <input-port?> can be an http-client-response structure defined in bzlib/http, which provides an content-type header that helps aid the determination of whether this is an html or xml document, for example:

(read-sxml (http-get "http://www.google.com/")) 
There are corresponding write-sxml and write-xexpr functions:

;; write-xexpr 
(write-xexpr <xexpr?> <output-port?>) 
;; write-sxml
(write-sxml <sxml?> <output-port?>) 

Managing Entities

As part of converting from xexpr to sxml you'll need to deal with normalizing the xml entities. Since xexpr simply converts entities into symbols and numeric entities into numbers instead of converting them into final strings, bzlib/xml provides a entity->string routine that'll convert the entity into strings for you.

(entity->string <symbol or number entity>) 
This is automatically called by xexpr->sxml, read-xepxr, and read-sxml, so you generally do not have to use it explicitly, except to extend the entity mapping.

entity->string converts entities by mapping numeric entities via against the unicode character map, and symbol entities via two separate entity mapping tables, one for predefined HTML entities, and the other is a parameterizable XML entities.

The HTML entity table contains a set of pre-defined HTML entities that were mapped to the underlying character numeric code. Generally you should not have to modify this set of entities, but if you need to, you can do so via set-html-entities!:

(set-html-entities! <list of symbol/integer pairs>) 
;; example
(set-html-entities! `((nbsp . 160) (lt . 60) (gt . 62))) 

The XML entity table (xml-entities) is parameterizable, and it takes a list of symbol and string pairs:

(parameterize ((xml-entities '((lt . "<") (gt . ">")))) 
  (read-sxml ...)) 
If you use the same symbol entity in both tables, the xml-entities takes precedence.

Any unknown entity is mapped to the null character.

That's it for now. Enjoy.

No comments:

Post a Comment