Thursday, October 1, 2009

Handling Time Zone in Scheme (5): Pre-Convert the Rules

This is part five of the timezone series - you can find the previous posts on this subject to get up to speed on the details:
  1. motivation and overview
  2. parsing the zoneinfo database
  3. compute the offsets
  4. convert the rules into dates

We previously have finished the first draft of tz-offset, and it works correctly in majority of the situations. But unfortunately, there is one issue with it:
The code will not work correctly when there are gaps between the years where the rules are applicable.
Fortunately, under regular use of the code, we will not encounter this issue. But the issue definitely exists. Below is the US rule:

  ("US"
   (1918 1919 - 3 (last 0) (2 0 0 w) 3600 "D")
   (1918 1919 - 10 (last 0) (2 0 0 w) 0 "S")
   (1942 1942 - 2 9 (2 0 0 w) 3600 "W")
   (1945 1945 - 8 14 (23 0 0 u) 3600 "P")
   (1945 1945 - 9 30 (2 0 0 w) 0 "S")
   (1967 2006 - 10 (last 0) (2 0 0 w) 0 "S")
   (1967 1973 - 4 (last 0) (2 0 0 w) 3600 "D")
   (1974 1974 - 1 6 (2 0 0 w) 3600 "D")
   (1975 1975 - 2 23 (2 0 0 w) 3600 "D")
   (1976 1986 - 4 (last 0) (2 0 0 w) 3600 "D")
   (1987 2006 - 4 (match 0 >= 1) (2 0 0 w) 3600 "D")
   (2007 +inf.0 - 3 (match 0 >= 8) (2 0 0 w) 3600 "D")
   (2007 +inf.0 - 11 (match 0 >= 1) (2 0 0 w) 0 "S"))
You can see that there is a gap between 1945 and 1967 for applicable rules. You probably are not going to calculate the offsets for 1948 most of the time, but it would be nice if we do not have to face the potential issues.

What we want is to have the last rule of 1945 continue to be applicable until 1967 in this case. And that means we need to be able to skip over multiple years, which our current design does not account for. The nice thing is that the situation is not as dire as it sounds, since most of the timezones that I inspected visually do not make use of the "US" rule during this gap. But it would be nice to know such potential bug will not exist.

As we have found out in the previous posts, inferring the "previous" rules with the data format is difficult, and it's easier to compute the applicable rules for a given year, the easiest solution is to pre-compute the dates for every year that we need to worry about. In this case, any of the gaps will automatically be filled with the previously applicable rules, with the applicable years.

The Applicable Years

It should be obvious that the timezone concept has definite bound toward the past, as GMT was not established until 1675, and US does not adopt the timezone until 1918. The minimum year from the zoneinfo database is 1916:


(define (zone-year (aggregate min))
  (define (until-helper zone) 
    (let ((until (zone-until zone))) 
      (if (date? until) (date-year until) #f)))
  (let ((rules (filter identity 
                       (flatten 
                        (hash-map zones
                                  (lambda (key zone)
                                    (map zone-rule zone))))))
        (untils (hash-map zones (lambda (key zone)
                                  (map until-helper zone)))))
    (apply aggregate 
           (filter (lambda (date)
                     (and (number? date) 
                          (not (equal? date +inf.0))
                          (not (equal? date -inf.0)))) 
                   (flatten (append untils 
                                    (map rule-from rules)
                                    (map rule-to rules)))))))

(zone-year min) ;; ==> 1916. 

And while there is no clear upper bound for the time zones, we can follow some conventions to help establish such upper bounds to avoid infinite time zone generation, which would have likely to be incorrect anyways, since timezone and daylight saving times can easily be changed in the future at the whim of politicians and governments. A good upper bound is 2038, since that's the Y2K problem for Unix, and it will give us plenty of time to update the library. This number coincidentally is also the largest number in the zoneinfo database:

(zone-year max) ;; ==> 2038 
The two years now forms our range for calculation all applicable rules:

(define ZIC-MIN (zone-year min)) 
(define ZIC-MAX (zone-year max)) 

(define (all-applicable-rule/years rules)
  (sort (apply append 
               (for/list ((year (in-range ZIC-MIN ZIC-MAX))) 
                 (applicable-rules-by-year rules year)))
        rule/year
With the above we'll be able to convert zones with rules. What about the zones without rules? We'll just have to fill out the list of the rules ourselves:

(define (zone->rule/years zone)
  (if (zone-rule zone)
      (all-applicable-rule/years (zone-rule zone)) 
      (for/list ((year (in-range ZIC-MIN ZIC-MAX))) 
        (cons (make-rule year year 1 1 '- '(0 0 0 w) 0 "S") year))))
Now each list of the zones are basically mapped to a full list of the rule/years pairs. We want the final outcome to be a list of structure that contains the boundary (a date), the standard offset, and the daylight saving offset:

(define-struct normalized (bound std dst)) 
And our rule/year pair can be converted to normalized with the following:

(define (rule/year->normalized rule/year std-offset offset)
  (define (helper rule year)
    (let ((date (on/year->date rule year)))
      (match (rule-time rule) 
        ((list hour minute second type) ;; wall clock 
         (make-normalized (build-date (date-year date)
                                      (date-month date)
                                      (date-day date)
                                      hour 
                                      minute
                                      second 
                                      #:tz (case type 
                                             ((g u z) 0 0)
                                             ((s) std-offset)
                                             (else ;; wall-clock requires the previous rule...  
                                              (+ std-offset offset))))
                          std-offset
                          (rule-offset rule))))))
  (helper (car rule/year) (cdr rule/year)))
Which looks quite similar to rule/year->date/offset, which we might retrofit with this newer function.

The next challenge will then to be convert from zone into a list of normalized, following the offset dependency between the rules, as well as filtering out the upper and the lower bounds (the lower bounds comes from the previous zone).

Without filtering it looks like the following:

(define (zone->normalized zone (prev '()))
  (define (helper rule/years acc) 
    (cond ((null? rule/years) ;; we are done... 
           acc)
          (else
           (let ((normalized (rule/year->normalized (car rule/years)
                                                    (zone-offset zone)
                                                    (if (null? acc) 
                                                        0
                                                        (normalized-dst (car acc))))))
             (helper (cdr rule/years) (cons normalized acc))))))
  (helper (zone->rule/years zone) prev))
To filter for upper bound, we'll have to test against the UNTIL field. If it's #f, it means no upper bound. Otherwise we should first test for the year value to quickly filter away the ones that exceed the year, and if it's not the same year, we can keep the rule. For the boundary in the same year as UNTIL we'll convert UNTIL into a date object for comparison by using the boundary's wall time:

(define (zone->normalized zone (prev '()))
  (define (until-helper offset)
    (define (helper year month day hour minute second type)
      (build-date year month day hour minute second
                                    #:tz (case type 
                                           ((u g z) 0)
                                           ((s) (zone-offset zone))
                                           ((w) (+ offset (zone-offset zone))))))
    (apply helper (zone-until zone)))
  (define (helper rule/years acc) 
    (cond ((null? rule/years) ;; we are done... 
           acc)
          ((and (zone-until zone) ;; if until & date > until's year. 
                (> (cdar rule/years) (car (zone-until zone))))
           (helper (cdr rule/years) acc))
          (else
           (let ((normalized (rule/year->normalized (car rule/years)
                                                    (zone-offset zone)
                                                    (if (null? acc) 
                                                        0
                                                        (normalized-dst (car acc))))))
             (cond ((not (zone-until zone)) ;; infinite upper bound 
                    (helper (cdr rule/years) (cons normalized acc)))
                   ((< (cdar rule/years) (car (zone-until zone))) ;; less than until's year 
                    (helper (cdr rule/years) (cons normalized acc)))
                   (else ;; we need to ensure the bound is lower than the year
                    (let ((until (until-helper (if (null? acc) 
                                                   0
                                                   (normalized-dst (car acc))))))
                      (if (< until (normalized-bound normalized))
                          (helper (cdr rule/years) (cons normalized acc))
                          (helper (cdr rule/years) acc)))))))))
  (helper (zone->rule/years zone) prev))
To filter out the lower bound, we will test to see if the new normalized bound is greater than the previous batch's bound:

(define (zone->normalized zone (prev '()))
  ... 
  (define (helper rule/years acc) 
    (cond ...
          (else
           (let ((normalized (rule/year->normalized (car rule/years)
                                                    (zone-offset zone)
                                                    (if (null? acc) 
                                                        0
                                                        (normalized-dst (car acc))))))
             (cond ((date<? (normalized-bound normalized) 
                            (if (null? prev) 
                                (build-date 1 1 1) 
                                (normalized-bound (car prev)))) 
                    (helper (cdr rule/years) acc))
                   ...)))))
  (helper (zone->rule/years zone) prev)) 
With the above, we can just fold over the list of zones to get to the list of normalized:

(define (zones->normalized zones)
  (foldl zone->normalized 
         '()
         zones))
From this point on, the rest is to convert the serialization to serialize out the list, and then to modify tz-offset to take use the new list. Stay tuned.

No comments:

Post a Comment