Tuesday, August 18, 2009

Continuing of Integration Features - HTTP Call & Proxy

In order to fulfill the role of an integrator, we want to add the ability to make HTTP calls to other services (either within your own network or 3rd party web services). The reason such capability is needed is simple: unless we plan on developing everything under the sun, we'll need to interface with other applications that provides crucial and non-trivial services to speed up your development.

PLT Scheme already provides basic URL fetching capability through the net/url module, which can serve as a basis for our development. Let's get started.

HTTPS Calls

The first thing to note is that net/url does not support SSL connections, so we'll need to add the support ourselves.

(require openssl
scheme/tcp
net/url
mzlib/trace
scheme/contract
)

;; https->impure-port
;; base function to handle the connection over https
(define (https->impure-port method url (headers '()) (data #f))
(let-values (((s->c c->s) (ssl-connect (url-host url)
(if (url-port url) (url-port url) 443)))
((path) (make-url #f #f #f #f
(url-path-absolute? url)
(url-path url)
(url-query url)
(url-fragment url))))
(define (to-server fmt . args)
(display (apply format (string-append fmt "\r\n") args) c->s))
;; (trace to-server)
(to-server "~a ~a HTTP/1.0" method (url->string path))
(to-server "Host: ~a:~a" (url-host url)
(if (url-port url) (url-port url) 443))
(when data
(to-server "Content-Length: ~a" (bytes-length data)))
(for-each (lambda (header)
(to-server "~a" header)) headers)
(to-server "")
(when data
(display data c->s))
(flush-output c->s)
(close-output-port c->s)
s->c))

;; get-impure-port/https
;; a GET version of https call
(define (get-impure-port/https url (headers '()))
(https->impure-port "GET" url headers))

;; post-impure-port/https
;; a POST version of https call
(define (post-impure-port/https url data (headers '()))
(https->impure-port "POST" url headers data))

The above provided get-impure-port/https and post-impure-port/https, which mimics net/url's get-impure-port and post-impure-port. We are only interested in impure port as we want to be able to manipulate the headers, since it's trivial to add pure port on top of it.

With the above we can then create an abstraction over both https & http url fetching:

;; http-client-response holds all of the metadata (code, status, headers)
;; as well as the data stream
(define-struct http-client-response (version code reason headers input)
#:property prop:input-port 4)

;; read-http-status
;; parse the http-status of the response.
(define (read-http-status in)
(define (helper match)
(if match
(list (cadr match) (caddr match) (cadddr match))
match))
(define (reader in)
(read-folded-line in))
(trace reader)
(helper (regexp-match #px"^HTTP/(\\d\\.\\d)\\s+(\\d+)\\s+(.+)$" (reader in))))

;; a helper over the make-http-client-response
(define (*make-http-client-response in)
(define (helper version code reason)
(make-http-client-response version (string->number code) reason (read-headers in) in))
(let ((status (read-http-status in)))
(if (not status)
(error 'make-http-client-response "invalid http response")
(apply helper status))))
;; helper over url conversion
(define (url-helper url)
(if (string? url) (string->url url)
url))

;; converting headers over to headers that can be used by get/post-impure-port
(define (headers-helper headers)
(map (lambda (kv)
(format "~a: ~a" (car kv) (cdr kv)))
headers))

;; http-get
;; abstraction over http GET
(define (http-get url (headers '()))
(define (helper url)
(*make-http-client-response
((if (string-ci=? (url-scheme url) "https")
get-impure-port/https
get-impure-port)
url (headers-helper headers))))
(helper (url-helper url)))

;; http-post
;; abstarction over http POST
(define (http-post url data (headers '()))
(define (helper url)
(*make-http-client-response
((if (string-ci=? (url-scheme url) "https")
post-impure-port/https
post-impure-port)
url data (headers-helper headers))))
(helper (url-helper url)))

The abstraction more of less follows the net/url's approach, except that it wraps around both http & https procedures, adding convenient header handlings, as well as providing an abstraction over the http response to parse through all of the metdata, but yet still retain their values (unlike get/post-pure-port which gets rid of all of the status and headers).

Reading and Parsing RFC822 Headers

RFC822 compliant headers requires non-trivial treatment. While the concept of headers that's made of key/values appear simple, in fact they are not for many historical reasons that are well captured in all of the RFC's. Below are a list of things that we need to be aware of:
  • RFC822 headers might span multiple lines in the style of "folded line" (the line continues if the following line starts with non-terminating whitespace, which includes #\space and #\tab), and it might keep going indefinitely
  • The header values might contain comments (enclosed in parentheses), which are nestable, and generally the comments should be ignored but might not be (for example - many server generate date fields with comment to denote the timezone)
  • Because of the traditional SMTP line width limitations, generating headers might require breaking the line into the folded line along line width limitation (which generally is around 70)
  • Also due to the traditional ASCII oriented nature of network protocols, there are two additional encodings defined (called Q and B) that parsers should be able to handle appropriately in order to correctly parse headers
It would be cool to support all of the above capabilities, but given we are only using headers in HTTP situation (which generally are not subjected to the SMTP line & encoding limits) we'll limit ourselves to the following for the immediate purpose:
  • generating a single line per header
  • parse a folded line per header, but do not handle encodings
  • assume the character set to be UTF-8 (otherwise throw errors)
The following handles the folded line:

;; read-folded-line
;; read folded line according to RFC822.
(define (read-folded-line in)
(define (folding? c)
(or (equal? c #\space)
(equal? c #\tab)))
(define (return lines)
(apply string-append "" (reverse lines)))
(define (convert-folding lines)
(let ((c (peek-char in)))
(cond ((folding? c)
(read-char in)
(convert-folding lines))
(else
(helper (cons " " lines))))))
(define (helper lines)
(let ((l (read-line in 'return-linefeed)))
(if (eof-object? l)
(return lines)
(let ((c (peek-char in)))
(if (folding? c) ;; we should keep going but first let's convert all folding whitespaces...
(convert-folding (cons l lines))
;; otherwise we are done...
(return (cons l lines)))))))
(helper '()))

Then to read all of the headers is to read in all of the folded lines until we encounter an empty line (which would either be EOF or the separator between headers and the data).

;; a header is simply a pair of strings...
(define (header? h)
(and (pair? h)
(string? (car h))
(string? (cdr h))))

;; header->string: does not generate terminator
(define (header->string h)
(format "~a: ~a" (car h) (cdr h)))

;; string->header
;; convert a string into a header?
(define (string->header line)
(define (helper match)
(if match
(cons (cadr match) (caddr match))
#f))
(helper (regexp-match #px"^([^:]+)\\s*:\\s*(.+)$" line)))
;; reading header.
;; RFC822 headers are actually non-trivial for parsing purposes.
;; first it requires the handling of "folded line", which means that any line
;; that does not end directly in
(define (read-headers in)
(define (return lines)
(map line->header lines))
(define (helper lines)
(let ((l (read-folded-line in)))
(if (string=? l "") ;; we are done...
(return lines)
(helper (cons l lines)))))
(helper '()))

The above should cover the majority of cases that we'll encounter during interactions with web services, and we'll fix issues as we encounter them.

To be continued...

No comments:

Post a Comment