Friday, September 25, 2009

Handling Time Zone in Scheme: Motivation & Overview

Timezone is a difficult problem, more difficult than it has to be. Probably the biggest challenge is the daylight saving time. It changes with the whims of politicians and governments. Once a while, a city/state will just arbitrarily decide to change their timezone. How should timezone software handle such changes?

In such case, we might naturally assume the answer is to keep all of the same hours but for a different timezone. This answer would have been sufficient when things are not global: what if some of the appointments are with people from other timezone?

Luckily such situation does not arise very often, and neither does the changing of the daylight saving switching dates. But they illustrate the difficulty of timezone handlings.

Insufficient Solution

Most software stores an offset along with a time to denote the timezone offset from GMT, and sometimes even a check to determine whether it is a daylight saving time. In PLT Scheme - the PLT date object has both.

(define-struct date (second minute hour day month year week-day year-day dst? time-zone-offset))
But such solution is brittle in the case of date manipulations, even without the drastic circumstances above. What if you want to calculate the date four month ahead? How do you know whether or not the time should have the same daylight saving offset applied?

A possible solution is to call the C date functions, which can handle the date calculation correctly, pending the TZ environment variable. The drawback of the approach is that if your problem needs to be timezone aware, then you'll be constantly swapping your environment. Besides the fact that such approach will serialize all of your threads, it is also less desirable.

Let's see if we can bring timezone handling into scheme.

Zoneinfo Database

How does C date functions knows how to calculate dates? It consults all of the timezone information with a database called zoneinfo. This database contains all of the past and current timezones and their corresponding offsets. This database is the best authoritative source if you need to handle timezones.

On linux/mac - type man zic, and you will get the details on the format of the zoneinfo files. The zoneinfo files are line oriented, with comment lines starting with #. There are two main constructs we are interested from the files are the zones and the rules:

# excepts from main zic 
A zone line has the form

     Zone  NAME                GMTOFF  RULES/SAVE  FORMAT  [UNTIL]

For example:

     Zone  Australia/Adelaide  9:30    Aus         CST     1971 Oct 31 2:00

A rule line has the form

     Rule  NAME  FROM  TO    TYPE  IN   ON       AT    SAVE  LETTER/S

For example:

     Rule  US    1967  1973  -     Apr  lastSun  2:00  1:00  D
Basically, each zones contains one or more rules that denotes how to figure out the offset for dates within that particular timezone. Below is a more complete example:

# Zone NAME  GMTOFF RULES FORMAT [UNTIL]
Zone America/Los_Angeles -7:52:58 - LMT 1883 Nov 18 12:07:02
   -8:00 US P%sT 1946
   -8:00 CA P%sT 1967
   -8:00 US P%sT
So until Nov 18, 1883, the LA timezone has a standard GMT offset of -7:52:58 (without any rules), then it switched over to US rules with offset of -8:00 until 1946, switching to the CA rules until 1967, and then switching back to US rules. The following are the US & CA rules.

# Rule NAME FROM TO TYPE IN ON AT SAVE LETTER/S
Rule US 1918 1919 - Mar lastSun 2:00 1:00 D
Rule US 1918 1919 - Oct lastSun 2:00 0 S
Rule US 1942 only - Feb 9 2:00 1:00 W # War
Rule US 1945 only - Aug 14 23:00u 1:00 P # Peace
Rule US 1945 only - Sep 30 2:00 0 S
Rule US 1967 2006 - Oct lastSun 2:00 0 S
Rule US 1967 1973 - Apr lastSun 2:00 1:00 D
Rule US 1974 only - Jan 6 2:00 1:00 D
Rule US 1975 only - Feb 23 2:00 1:00 D
Rule US 1976 1986 - Apr lastSun 2:00 1:00 D
Rule US 1987 2006 - Apr Sun>=1 2:00 1:00 D
Rule US 2007 max - Mar Sun>=8 2:00 1:00 D
Rule US 2007 max - Nov Sun>=1 2:00 0 S

# Rule NAME FROM TO TYPE IN ON AT SAVE LETTER
Rule CA 1948 only - Mar 14 2:00 1:00 D
Rule CA 1949 only - Jan  1 2:00 0 S
Rule CA 1950 1966 - Apr lastSun 2:00 1:00 D
Rule CA 1950 1961 - Sep lastSun 2:00 0 S
Rule CA 1962 1966 - Oct lastSun 2:00 0 S
The rules are a bit more complicated. If we look at the couple of the lines from the US rules:

Rule US 1967 2006 - Oct lastSun 2:00 0 S
The above says that from 1967 to 2006, we move to "standard" time (offset of 0) at 2:00 am (wall clock time) on the last Sunday of October.

Rule US 1976 1986 - Apr lastSun 2:00 1:00 D
Rule US 1987 2006 - Apr Sun>=1 2:00 1:00 D
The lines say that from 1976 to 1986, we switch to daylight saving time (offset of 1:00) on the last Sunday of April at 2:00 am (wall clock time), but from 1987 to 2006, we do the same switch on the first Sunday of April.

The combination of the zones and the rules gives us enough data to build logics for time zone conversion, so we can calculate the dates correctly for the timezone (and able to convert between different timezones correctly).

What we want then is to build a library that can parse the zoneinfo db and use the database to manipulate the dates.

Stay tuned.

No comments:

Post a Comment